feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

feedparser should not attempt to parse HTTP error pages

Open dechamps opened this issue 7 months ago • 0 comments

$ python3 -c 'import feedparser; import pprint; pprint.pp(feedparser.parse("http://httpstat.us/500"));'
{'bozo': 1,
 'entries': [],
 'feed': {},
 'headers': {'content-length': '25',
             'connection': 'close',
             'content-type': 'text/plain',
             'date': 'Tue, 23 Jul 2024 09:26:01 GMT',
             'server': 'Kestrel',
             'set-cookie': 'ARRAffinity=0b6744c5c65f60053b4261472f06470832ebaff4bed4a8258e6eb824fe0a51e1;Path=/;HttpOnly;Domain=httpstat.us',
             'request-context': 'appId=cid-v1:3548b0f5-7f75-492f-82bb-b6eb0e864e53'},
 'href': 'http://httpstat.us/500',
 'status': 500,
 'encoding': 'us-ascii',
 'bozo_exception': SAXParseException('syntax error'),
 'version': '',
 'namespaces': {}}

feedparser reports SAXParseException('syntax error') on a 500 HTTP status code, suggesting that it attempted to parse the 500 error body. This is a confusing error - ideally, feedparser should not even attempt to parse HTTP error pages, and should clearly report the HTTP error instead.

(Also, this should really raise an exception, but that's a separate issue - see #329)

dechamps avatar Jul 23 '24 09:07 dechamps