pine icon indicating copy to clipboard operation
pine copied to clipboard

Question about suitability to Web scraping.

Open deabreu opened this issue 3 years ago • 0 comments

Hello all. Please, forgive me if I'm making a wrong move posting this question here.

I'm looking for an alternative in Scala for Scrapy for parsing HTML documents for Web Scraping. I've been trying to build this alternative using Jsoup, but as it is a pure Java library, the conversion for Scala every time made the development a little counterintuitive and I'd like to have a more Functional approach.

I've come across Pine, as such an approach but the project seems to be more focused on building the rendering than creating a data structure model from an existing project, which would be my main focus. If that is incorrect, please help me clarify this impression.

Given that thought, I ask you to answer these questions about the project, or the documentation.

  1. Can Pine parse any existing HTML5 compliant document into a tree-like hierarchical structure? And can this structure be queried?
  2. Can Pine help me parse Javascript code for dynamic sites? If so, could you point me out an example of how to start doing it, please?
  3. If not, could you point me out some possible way to work around this limitation, please?

deabreu avatar Aug 24 '21 12:08 deabreu