500lines
500lines copied to clipboard
What happens when a reponse takes very long
I tried to add:
response = yield from asyncio.wait_for(
self.session.get(url, allow_redirects=False), 20)
instead of
response = yield from self.session.get(url, allow_redirects=False)
In order to prevent hanging from a server by introducing a max_timeout
, but this seems to open up a lot of CancelledErrors
(and a lot of Task was destroyed but it is pending
). Any idea?
You should wrap wait_for
call into try/except
block and gracefully close the task.
https://github.com/aosabook/500lines/blob/master/crawler/code/crawling.py#L233 is a good place for catching at first glance.
I am a bit confused. It seems as if you want to put a timeout there, while I'm talking about the crawling "get" (targeting the server).
It would seem the place where you suggest is that whenever somehow getting an item from the queue taking too long, then it would gracefully end.
Whereas I'm in the fetch
method (https://github.com/aosabook/500lines/blob/master/crawler/code/crawling.py#L175) ( trying to put the wait_for
). Is that still correct?
I put it at both places, and that seems to solve some issues. But now whenever the queue is empty it will try to stop the worker, and it will throw a ERROR:asyncio:Task was destroyed but it is pending!
I catch it, but still I get 2 messages PER worker at the end of the script (not so nice for wanting to save the logging):
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending coro=<get() done, defined at /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/queues.py:160> wait_for=<Future pending cb=[Task._wakeup()]> cb=[_release_waiter(<Future cancelled>)() at /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/tasks.py:333]>