scrapy-playwright
scrapy-playwright copied to clipboard
Conditionally apply page methods
Hi.
I crawl a website using scrapy_plawright , I use wait_for_selector
and when page isn't exist (status = 404) scrapy_playwright wait until Timeout
and then raise exception. Is there any way to prevent scrapy_playwright to call playwright_page_methods
when status is 404. It can save much time in my case.
I was tried to implement it using PLAYWRIGHT_ABORT_REQUEST
in settings.py but it didn't work:
import asyncio
async def handle_404(req):
await req.response().status == 404
PLAYWRIGHT_ABORT_REQUEST = lambda req:asyncio.run(handle_404(req))
Thanks a lot.
I'll need some time to find a solution to solve this in a general way. In the meantime, I'd suggest you to pass a specific timeout
value to the wait_for_selector
method, as a workaround.
I've decided to archive this for now. I want to avoid building a complex DSL for page methods, they are intended for simple actions. For more complex scenarios it's possible to access the full page with the playwright_include_page
meta key.