feedparser
feedparser copied to clipboard
feedparser should not attempt to parse HTTP error pages
$ python3 -c 'import feedparser; import pprint; pprint.pp(feedparser.parse("http://httpstat.us/500"));'
{'bozo': 1,
'entries': [],
'feed': {},
'headers': {'content-length': '25',
'connection': 'close',
'content-type': 'text/plain',
'date': 'Tue, 23 Jul 2024 09:26:01 GMT',
'server': 'Kestrel',
'set-cookie': 'ARRAffinity=0b6744c5c65f60053b4261472f06470832ebaff4bed4a8258e6eb824fe0a51e1;Path=/;HttpOnly;Domain=httpstat.us',
'request-context': 'appId=cid-v1:3548b0f5-7f75-492f-82bb-b6eb0e864e53'},
'href': 'http://httpstat.us/500',
'status': 500,
'encoding': 'us-ascii',
'bozo_exception': SAXParseException('syntax error'),
'version': '',
'namespaces': {}}
feedparser reports SAXParseException('syntax error')
on a 500 HTTP status code, suggesting that it attempted to parse the 500 error body. This is a confusing error - ideally, feedparser should not even attempt to parse HTTP error pages, and should clearly report the HTTP error instead.
(Also, this should really raise an exception, but that's a separate issue - see #329)