scala-scraper icon indicating copy to clipboard operation
scala-scraper copied to clipboard

A Scala library for scraping content from HTML pages

Results 17 scala-scraper issues
Sort by recently updated
recently updated
newest added

Updates [org.scoverage:sbt-scoverage](https://github.com/scoverage/sbt-scoverage) from 2.0.0 to 2.0.2. [GitHub Release Notes](https://github.com/scoverage/sbt-scoverage/releases/tag/v2.0.2) - [Version Diff](https://github.com/scoverage/sbt-scoverage/compare/v2.0.0...v2.0.2) I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd...

just like https://github.com/SeleniumHQ/selenium/wiki/ChromeDriver, we can use a real browser.

@ruippeixotog You did a great job with this scraper! Is there a way to extract the content that a page would get asynchronously after the it finished the natural rendering?

Before .siblings would be inferred to be a `Iterable[Element#ThisType]` instead of an `Iterable[JsoupElement]`. The former lacks a lot of functionality.

- Copyright year - added links to css selector resources

Running `browser.get()` or `browser.post()` on Heroku keeps returning org.JsoupHttpStatusException ![bug-heroku](https://user-images.githubusercontent.com/23453888/80260477-e9d15480-867f-11ea-929a-b32a07dbf144.png)

I try to parse big table element with ContentExtractors.table. but, buildRow and buildTable method is not tail recursion. Thereby ContentExtractors.table function throwed StackOverflowError. that failed to parse URL: http://www.tipness.co.jp/schedule/SHP063/month

When parsing pages in a foreign language - a common use case for this library - it is sometimes needed to parse dates formatted in another locale (e.g. different month...

It would be nice if the browsers has an asynchronous version of `get`---this way you can just do several page loads at once. As a work around, can I use...

As you probably know, Chrome now supports headless (https://developers.google.com/web/updates/2017/04/headless-chrome), and one way to call it is through WebDriver. Any plan for scala-scraper to support headless Chrome?