feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

Feedparser seems to occasionally hang and has no timeout

Open peterashwell opened this issue 9 years ago • 4 comments

According to this the default timeout in urllib2 is -1, or None. So... this is a problem for long running programs, when occasionally some connection will hang everything.

Solution is pretty simple, add a timeout to the 'open' here https://github.com/kurtmckee/feedparser/blob/39a7157ff8280991d71af0d79e6412f66cffd470/feedparser/http.py#L175

I'll fork and try make a fix

peterashwell avatar Jul 10 '16 00:07 peterashwell

this issue seems like a real problem for there seems to be no clean workaround. Can't wait to see the next release because of that.

rigid avatar Feb 19 '17 22:02 rigid

If you want a quick workaround you can monkey patch and use requests lib instead with proper timeout. It also fixes https certificate issues I had with default feedparser url open implementation. This is how I do it:

import requests
import feedparser

feedparser._open_resource = lambda *args, **kwargs: feedparser._StringIO(requests.get(args[0], timeout=5).content)

Update: On versions above 6.x use following:

import requests
import feedparser

feedparser.api._open_resource = lambda *args, **kwargs: requests.get(args[0], headers=headers, timeout=5).content

darklow avatar Jul 04 '17 17:07 darklow

above did the job for my error:

have very simple app polling, once in a while feedparser does not return and needs 2x ^C to exit the script, and it then prints:

^CTraceback (most recent call last):
  File "frontend/myfeed/src/main.py", line 48, in main
  File "/home/user/.local/share/virtualenvs/workspace_python-Cp_/lib/python3.7/site-packages/feedparser.py", line 3841, in parse
    data = f.read()
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 464, in read
    return self._readall_chunked()
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 574, in _readall_chunked
    value.append(self._safe_read(chunk_left))
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/http/client.py", line 620, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/home/user/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
KeyboardInterrupt

^CTraceback (most recent call last):
  File "frontend/myfeed/src/main.py", line 56, in <module>
  File "frontend/myfeed/src/main.py", line 53, in main

Not sure if related and if above fixes this. I am using the latest pip install version

ghost avatar Sep 11 '19 16:09 ghost

If you want a quick workaround you can monkey patch and use requests lib instead with proper timeout

having a broken implementation leads to devs doing workarounds like this that then have issues and other devs just copy-paste wrong solutions, this will ignore etag and modified arguments, so the feed will be download completely each time in an inefficient way, and may cause servers to block you, so just saying that you should use a separated lib to get the data and pass the data to feedparser will not be that convenient and then devs will need to implement some function that pass the appropriated headers for etag and modified and then modified header needs proper formatting, it is not just like importing requests and doing requests.get

adbenitez avatar Aug 15 '21 20:08 adbenitez

feedparser has dropped all custom HTTP client code in favor of the requests package. (This change not been released yet because I am still working on a significant effort to update the code and documentation.) At this time, the code has a 10 second HTTP request timeout set.

I'm closing this issue for this reason.

kurtmckee avatar Apr 10 '23 12:04 kurtmckee