scrapy-selenium icon indicating copy to clipboard operation
scrapy-selenium copied to clipboard

Handle timeout exception from selenium and still return the page

Open michelts opened this issue 4 years ago • 3 comments

Hi @clemfromspace

I'm using the wait_time and wait_until to wait for a page to be rendered but, sometimes, the page renders a way I'm not expecting. If I don't use wait_time, I will see the rendered content (if it was faster enough), but using wait time, selenium will trigger a timeout exception and scrapy won't parse the result after all.

I wonder if this is something useful somehow, but I'm not sure. I think the approach should be the opposite, I mean, we should handle the exception and still return the found content to scrapy, so I can at least see the snapshot or see the HTML content.

michelts avatar Mar 03 '20 18:03 michelts

Just to note, the exception got from scrapy is:

Traceback (most recent call last):
  File ".../lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File ".../lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 38, in process_request
    response = yield method(request=request, spider=spider)
  File ".../lib/python3.6/site-packages/scrapy_selenium/middlewares.py", line 115, in process_request
    request.wait_until
  File ".../lib/python3.6/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

michelts avatar Mar 03 '20 19:03 michelts

I am also wondering how to correctly handle the TimeoutException, so I can still parse the page with scrapy even if the content doesn't load.

dustinmichels avatar Jan 29 '21 17:01 dustinmichels

I have the same issue. In my case I want to "Retry" the request which hit a selenium.common.exceptions.TimeoutException, however that also doesn't seem to work because scrapy doesn't know there was a Timeout so it can't pass the response object to the Retry Middleware.

aivoric avatar Sep 24 '21 17:09 aivoric