scrapy-playwright
scrapy-playwright copied to clipboard
🎭 Playwright integration for Scrapy
I am facing an issue when using chromium, when trying to download a PDF file: the response.body is the viewer plugin HTML, not the bytes. There's already a concerned fix...
Hey folks. At the company we work at, BuscoJobs, we applied Scrapy Playwright on 48 spiders. We have created a guideline (in Spanish) to help users get started with this...
I sometimes get this error when i use scrapy-pilaywright ``` 2023-03-31 09:33:35 [asyncio] ERROR: Task was destroyed but it is pending! source_traceback: Object created at (most recent call last): File...
I start one chrome browser at cdp port 40000 i use PLAYWRIGHT_CDP_URL = "http://localhost:40000" in my setting file but every time scrapy start to work, it will create new browser,and...
I'm having trouble getting Scrapy + Playwright to respect caches when crawling, when using a persistent context. I've tried to get it down to a minimal example, which you can...
I am attempting an SSO login to a website (I have access to this) via scrapy-playwright, and find that my playwright-script hangs when I use `wait_for_function` and this recursively produces...
Greetings! I am using scrapy-playwright along with Selenium Grid browser cluster. And if the crawling process by the spider is delayed - the cluster can forcibly close the session and...
When the chrome is killed or crash, the context will continue create newpage and throw exception: ```log 2023-01-31 19:29:51 [scrapy.core.scraper] ERROR: Error downloading Traceback (most recent call last): File "/home/test/source/test/venv/lib/python3.10/site-packages/twisted/internet/defer.py",...
``` Python 3.9.13 Daphne 4.0.0 Django 4.1.2 Channels 4.0.0 Scrapy 2.7.0 scrapy-playwright 0.0.22 ``` My settings: ```python DOWNLOAD_HANDLERS = { "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", } TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" ``` My...
When awswaf questions the browser, it will return the page to http 202 and modify the page content to javascript. Then the page will initiate the corresponding request. If it...