This posts shows how we can parse and extract data stored in HTML documents using scala and jsoup with functional programming.
In this example we are going to scrape IMDB reviews for a given movie id.
We are going to use Jsoup to fetch and parse HTML documents.
Create a sbt project and add jsoup dependency to your build.sbt.
Review Class.
Given an url to web page this methods returns a Jsoup Document of that web page.
This method extracts all the data and create a Review Model from a jsoup Element.
This methods loops over all the individual elements in elements, a single element consists of a review, as annotated this method is tail recursive. This methods stops when count >= limit.
So we have created all the required methods. Now all we need to do is create Jsoup doc using method we created previously and call scrapeReviewContainer passing that doc and maximum numbers of reviews we need to fetch. This methods returns list of reviews.
and we are done here.
Complete code is available here with added functionality.