decapitated
decapitated copied to clipboard
About this endeavour
With proper "headless chrome" being "a thing" now — https://developers.google.com/web/updates/2017/04/headless-chrome — Chrome 59+ on anyone's system can be either instrumented at the cmdline or via the devtools protocol. Note that:
At the moment, Phantom also provides a higher level API than the DevTools Protocol.
is on the linked web page so I'm expecting the chrome team to provide direct "webdriver" support or a higher-level JS API like phantomjs has.
Enabling individual R users to "just use" their own instance of Chrome removes obstacles like Docker (tho this is a gd image https://github.com/ebidel/lighthouse-ci/blob/master/builder/Dockerfile) or virtual machines from the equation, so I'm unlikely to go down that route. I'm also not keen on building a version of chrome with "R" in it or R hooks in it since that means One More Thing to download.
Once/if webdriver support is added, this pkg might be moot. There's no guarantee for webdriver support tho.
Shorter-term goals are:
- auto-locate Chrome binary (where possible)
- enable full use of all available cmdline switches, including GPU support (where possible)
Longer-term goal is:
Depending on how much time I have (or if others want to pile on!) getting the Chrome DevTools protocol working for instrumentation is a goal. It looks event-oriented and may mean dealing with C[++] or C-wrapped R callbacks OR making an R orchestration DSL that translates into DevTools protocol "commands" and then just getting the result.
I personally only care about getting content back out, so unless someone who cares more about detailed instrumentation for creating — say — a test framework for htmlwidgets jumps on, I'm solely focused on enabling easier JS-based web-scraping (like I did with the splashr pkg).
(keeping running notes here)
Did a bit more investigation and I think the R DSL makes the most sense.
Am likely going to wrap https://github.com/dhbaird/easywsclient and see if I can't bang out something half-usable in short-order.
Basic tests with wscat shows it's super-easy to create the proper JSON DevTools websocket function calls that return immediate responses/JSON values. For the core "data gathering" tasks that would be the primary purpose of this pkg in R, such functionality is pretty straightforward.