crawly icon indicating copy to clipboard operation
crawly copied to clipboard

`Crawly.Fetchers.Fetcher` implementation for Playwright

Open Nezteb opened this issue 2 years ago • 4 comments

Currently crawly has an implementation for Splash: https://github.com/elixir-crawly/crawly/blob/5eeeb2a3ba230ee55d2411a64f9e426957dc8c40/lib/crawly/fetchers/splash.ex

I tend to use Playwright (or Puppeteer if I only care about Chromium) for browser automation and testing, so it'd be cool to be able to use some of it's functionality from crawly.

The only thing I'm unsure of is whether or not Playwright exposes a requests page/API like Splash does:

Splash exposes the render.html endpoint which renders incoming requests sent with ?url get parameter.

I might end up picking this up, but I figured I'd create an issue beforehand. 😄

Nezteb avatar Mar 26 '23 18:03 Nezteb

Hard to say. I did not have a chance to explore these two tools. In some of my previous projects, phantom js was used for browser rendering, but now it seems to be a bit dead.

It would be interesting to see an example fetcher for Playwright or Puppeteer. Maybe we can add it to Crawly as a standard fetcher :) Just let me know how it goes!

oltarasenko avatar Mar 26 '23 20:03 oltarasenko

As a non-Elixir example, I just built a scraper for sites that will save each page as a PDF using Playwright: https://github.com/Nezteb/scrape-pdf

Next weekend I'll see what I can do about a crawly fetcher for it!

Nezteb avatar Mar 26 '23 23:03 Nezteb

https://github.com/mechanical-orchard/playwright-elixir will probably be able to support what you are looking for.

dbrody avatar Apr 05 '23 02:04 dbrody

mechanical-orchard/playwright-elixir

Oh nice, I'll check that out! I'll see if I can get a minimal demo of using crawly along with playwright-elixir as the fetcher!

Nezteb avatar Apr 05 '23 15:04 Nezteb