skrape.it
skrape.it copied to clipboard
Add a fetcher that uses a real Chrome browser to download the html
Adds a new Fetcher that uses a real Chrome browser to fetch the html. This solved a problem where I was unable to fetch a page that was partially generated by javascript using any of the existing fetchers. (I assume the page required a modern real browser for some reason I did not investigate further).
This change uses the cdt-java-client library found here to launch and communicate with a Chrome browser: https://github.com/kklisura/chrome-devtools-java-client
However due to a breaking change in Chrome that has not been fixed in this library I am using a fork with that one patch applied: io.fluidsonic.mirror:cdt-java-client:4.0.0-fluidsonic-1
. Hopefully the change gets merged back into the main library.
WIP warning: I figured I would publish this PR in its current state in case it helps anyone else. It does however not fullfil all the expectations of a fetcher. It does not return the correct http status etc, just the body. There is a Network class that can probably be used to extract those.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.56%. Comparing base (
382f21b
) to head (475065d
).
Additional details and impacted files
@@ Coverage Diff @@
## master #237 +/- ##
=======================================
Coverage 89.56% 89.56%
=======================================
Files 38 38
Lines 986 986
Branches 69 69
=======================================
Hits 883 883
Misses 81 81
Partials 22 22
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.