feedparser
feedparser copied to clipboard
Feedparser seems to occasionally hang and has no timeout
According to this the default timeout in urllib2 is -1, or None. So... this is a problem for long running programs, when occasionally some connection will hang everything.
Solution is pretty simple, add a timeout to the 'open' here https://github.com/kurtmckee/feedparser/blob/39a7157ff8280991d71af0d79e6412f66cffd470/feedparser/http.py#L175
I'll fork and try make a fix
this issue seems like a real problem for there seems to be no clean workaround. Can't wait to see the next release because of that.
If you want a quick workaround you can monkey patch and use requests lib instead with proper timeout. It also fixes https certificate issues I had with default feedparser url open implementation. This is how I do it:
import requests
import feedparser
feedparser._open_resource = lambda *args, **kwargs: feedparser._StringIO(requests.get(args[0], timeout=5).content)
Update: On versions above 6.x use following:
import requests
import feedparser
feedparser.api._open_resource = lambda *args, **kwargs: requests.get(args[0], headers=headers, timeout=5).content
above did the job for my error:
have very simple app polling, once in a while feedparser does not return and needs 2x ^C to exit the script, and it then prints:
^CTraceback (most recent call last):
File "frontend/myfeed/src/main.py", line 48, in main
File "/home/user/.local/share/virtualenvs/workspace_python-Cp_/lib/python3.7/site-packages/feedparser.py", line 3841, in parse
data = f.read()
File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 464, in read
return self._readall_chunked()
File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 574, in _readall_chunked
value.append(self._safe_read(chunk_left))
File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 620, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
KeyboardInterrupt
^CTraceback (most recent call last):
File "frontend/myfeed/src/main.py", line 56, in <module>
File "frontend/myfeed/src/main.py", line 53, in main
Not sure if related and if above fixes this. I am using the latest pip install version
If you want a quick workaround you can monkey patch and use
requestslib instead with proper timeout
having a broken implementation leads to devs doing workarounds like this that then have issues and other devs just copy-paste wrong solutions, this will ignore etag and modified arguments, so the feed will be download completely each time in an inefficient way, and may cause servers to block you, so just saying that you should use a separated lib to get the data and pass the data to feedparser will not be that convenient and then devs will need to implement some function that pass the appropriated headers for etag and modified and then modified header needs proper formatting, it is not just like importing requests and doing requests.get
feedparser has dropped all custom HTTP client code in favor of the requests package. (This change not been released yet because I am still working on a significant effort to update the code and documentation.) At this time, the code has a 10 second HTTP request timeout set.
I'm closing this issue for this reason.