scrapy-rotating-proxies icon indicating copy to clipboard operation
scrapy-rotating-proxies copied to clipboard

Scrapy stuck when page not response

Open herbert-h opened this issue 7 years ago • 5 comments

Scrapy stuck when page not response, can I give a timeout for page?

... 2018-01-22 09:27:09 [scrapy.extensions.logstats] INFO: Crawled 183 pages (at 42 pages/min), scraped 183 items (at 42 items/min) 2018-01-22 09:27:09 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 2, unchecked: 0, reanimated: 3, mean backoff time: 76s) 2018-01-22 09:27:39 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 4, unchecked: 0, reanimated: 0, mean backoff time: 159s) 2018-01-22 09:28:09 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 13 pages/min), scraped 196 items (at 13 items/min) 2018-01-22 09:28:09 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 3, unchecked: 0, reanimated: 1, mean backoff time: 199s) 2018-01-22 09:28:39 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 3, unchecked: 0, reanimated: 1, mean backoff time: 199s) 2018-01-22 09:29:09 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 196 items (at 0 items/min) 2018-01-22 09:29:09 [rotating_proxies.middlewares] INFO: Proxies(good: 1, dead: 2, unchecked: 0, reanimated: 2, mean backoff time: 242s)

It's wait more than 5 minutes to try first retry

herbert-h avatar Jan 22 '18 17:01 herbert-h

Need more info to help you, it could be a network problem on your end or in the server you're scraping from.

octohedron avatar Jan 26 '18 23:01 octohedron

I run into the same problem. When using a proxy the default download timeout (of 180 seconds) is used. You can adjust this with download_timeout or via its setting

ablepharus avatar Feb 08 '18 15:02 ablepharus

To explain this: You are running out of proxies. The middleware has default delay of 180 seconds which means it will use proxy A only once every 3 minutes. In your case all of your proxies are still waiting to cool down thus crawler has no proxies/slots and is waiting.

Granitosaurus avatar Dec 12 '18 07:12 Granitosaurus

I have this problem too. But in my case the i saw that still have unchecked proxies.

mapb1994 avatar Oct 29 '19 15:10 mapb1994

I guess issue #33 's suggested fix fixed it for me. In line 123 of middlewares.py, I replaced if 'proxy' in request.meta and not request.meta.get('_rotating_proxy'): with if 'proxy' in request.meta:. I guess this worked for me, but I don't know with absolute surety.

rajatshenoy56 avatar Aug 24 '20 06:08 rajatshenoy56