scrapy-playwright icon indicating copy to clipboard operation
scrapy-playwright copied to clipboard

🎭 Playwright integration for Scrapy

Results 51 scrapy-playwright issues
Sort by recently updated
recently updated
newest added

I am facing an issue when using chromium, when trying to download a PDF file: the response.body is the viewer plugin HTML, not the bytes. There's already a concerned fix...

upstream issue

Hey folks. At the company we work at, BuscoJobs, we applied Scrapy Playwright on 48 spiders. We have created a guideline (in Spanish) to help users get started with this...

documentation

I sometimes get this error when i use scrapy-pilaywright ``` 2023-03-31 09:33:35 [asyncio] ERROR: Task was destroyed but it is pending! source_traceback: Object created at (most recent call last): File...

upstream issue

I start one chrome browser at cdp port 40000 i use PLAYWRIGHT_CDP_URL = "http://localhost:40000" in my setting file but every time scrapy start to work, it will create new browser,and...

deprioritized

I'm having trouble getting Scrapy + Playwright to respect caches when crawling, when using a persistent context. I've tried to get it down to a minimal example, which you can...

upstream issue

I am attempting an SSO login to a website (I have access to this) via scrapy-playwright, and find that my playwright-script hangs when I use `wait_for_function` and this recursively produces...

needs more info

Greetings! I am using scrapy-playwright along with Selenium Grid browser cluster. And if the crawling process by the spider is delayed - the cluster can forcibly close the session and...

When the chrome is killed or crash, the context will continue create newpage and throw exception: ```log 2023-01-31 19:29:51 [scrapy.core.scraper] ERROR: Error downloading Traceback (most recent call last): File "/home/test/source/test/venv/lib/python3.10/site-packages/twisted/internet/defer.py",...

``` Python 3.9.13 Daphne 4.0.0 Django 4.1.2 Channels 4.0.0 Scrapy 2.7.0 scrapy-playwright 0.0.22 ``` My settings: ```python DOWNLOAD_HANDLERS = { "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler", } TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" ``` My...

needs more info

When awswaf questions the browser, it will return the page to http 202 and modify the page content to javascript. Then the page will initiate the corresponding request. If it...

needs more info