scrapy-playwright icon indicating copy to clipboard operation
scrapy-playwright copied to clipboard

Conditionally apply page methods

Open anvaari opened this issue 2 years ago • 1 comments

Hi. I crawl a website using scrapy_plawright , I use wait_for_selector and when page isn't exist (status = 404) scrapy_playwright wait until Timeout and then raise exception. Is there any way to prevent scrapy_playwright to call playwright_page_methods when status is 404. It can save much time in my case.

I was tried to implement it using PLAYWRIGHT_ABORT_REQUEST in settings.py but it didn't work:

import asyncio
async def handle_404(req):
    await req.response().status == 404
PLAYWRIGHT_ABORT_REQUEST = lambda req:asyncio.run(handle_404(req))

Thanks a lot.

anvaari avatar Apr 26 '22 06:04 anvaari

I'll need some time to find a solution to solve this in a general way. In the meantime, I'd suggest you to pass a specific timeout value to the wait_for_selector method, as a workaround.

elacuesta avatar May 09 '22 01:05 elacuesta

I've decided to archive this for now. I want to avoid building a complex DSL for page methods, they are intended for simple actions. For more complex scenarios it's possible to access the full page with the playwright_include_page meta key.

elacuesta avatar Jan 01 '24 18:01 elacuesta