feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

Not Well Formed RSS Feed

Open MrTyton opened this issue 7 years ago • 6 comments

Hi,

I've been getting a bozo exception like the following:

`In [5]: feedparser.parse("owlturd.com/rss")

Out[5]:

{'bozo': 1,

'bozo_exception': xml.sax._exceptions.SAXParseException('not well-formed (invalid token)'),

'encoding': u'utf-8',

'entries': [],

'feed': {},

'namespaces': {},

'version': u''} `

A manual inspection of the feed says that it's encoded in UTF-8, and some looking around tells me that the exception means that it's badly encoded, there's something wrong with it. As is though there's no way for me to tell the parser to ignore the UTF-8 errors and parse what it can without downloading the feed and doing it manually, then passing it into the parser function, which seems inefficient. Is there another way around it or some way to pass in more arguments to feedparser.parse()? There's no actual documentation for that that I can see.

MrTyton avatar May 05 '17 07:05 MrTyton

Use http://owlturd.com/rss instead of owlturd.com/rss

evdoks avatar Nov 14 '17 13:11 evdoks

Even I am facing the same problem. Can you help me parse this feed? It's working fine on feedparser==4.1 but not on latest one. https://www.prabhasakshi.com/feed.aspx?cat_id=14

deepakmishra avatar Apr 11 '18 10:04 deepakmishra

The problem with the feed depends on the feed itself. Please contact the feed author/generator. This issue is not related to feedparser itself. Can be closed.

buhtz avatar Apr 11 '18 14:04 buhtz

Yes, you can close it. I found a workaround.

    p = feedparser.parse(url)
    if not p['entries'] and "not well-formed" in str(p['bozo_exception']):
        rss1 = requests.get(url).content.decode("utf-16")
        rss1 = rss1.replace("utf-16","unicode")
        p = feedparser.parse(rss1)
    entries = p['entries']

deepakmishra avatar Apr 11 '18 14:04 deepakmishra

Technicaly your approach is a solution. But something for thought-provoking: We shouldn't write Newsfeed-Clients which accept corrupt data not fitting to a standard. Our clients should motivate the user to contact the feed owner.

buhtz avatar Apr 11 '18 20:04 buhtz

Please close.

buhtz avatar Jul 14 '19 20:07 buhtz