feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

Crash in feedparser 6.0.10

Open TheVamp opened this issue 1 year ago • 6 comments

I noticed that the latest released version of feedparser crashes, when a CDATA section contains a C Code snippets. Here is an example on how to reproduce the issue.

  • Install feedparser via python -m pip install feedparser
  • RSS XML Crash example - rss.zip
    • or you could use the original feed https://blog.trailofbits.com/feed/
import feedparser

with open("./rss_code_crash.xml", "r") as f:
    rss_data = f.read()
rss = feedparser.parse(rss_data)
# Or just this:
#rss = feedparser.parse('https://blog.trailofbits.com/feed/')

I tested the same issue on the develop branch, but the crash does not occur their. Thanks for your support.

TheVamp avatar Jun 20 '23 09:06 TheVamp

This is the minimum reproducible example:

<content:encoded xmlns:content="bogus">
    <![CDATA[
        <!h<!h<!h<
    ]]>
</content:encoded>

The crash is coming from within the Python standard library -- _markupbase.py at line 134 raises an AssertionError stating "unexpected '<' char in declaration".

On a side note, it appears that Trail of Bits is using Wordpress. Perhaps this is a bug that exists in Wordpress or one of the plugins in its ecosystem and could be fixed there, as well!

kurtmckee avatar Jun 20 '23 12:06 kurtmckee

Is there a specific code change in the develop branch that fixed that problem and interpret the content in a different way?

In the develop branch everything works as expected:

  • python -m pip install git+https://github.com/kurtmckee/feedparser@develop
  • using your RSS sample or my RSS sample as input
  • executing the python script from above and everything works fine

That was why I thought it is a bug in feedparser. I will have a look into the Wordpress topic.

TheVamp avatar Jun 21 '23 09:06 TheVamp

Yep, I saw the same thing with the develop branch.

The crash is a bug in the feedparser 6.0.10 release. However, that's happening because Wordpress is failing to escape the code in its <pre> blocks. It's two bugs, in different products, not one.

kurtmckee avatar Jun 21 '23 10:06 kurtmckee

Coincidentally I was about to raise this exact same issue for the same feed. Looking forward to a fix for it

fchorney avatar Jun 26 '23 19:06 fchorney

Hi, just curious if this is going to be addressed at some point. Since the issue seemed fixed in the develop branch, could we get a new release? I understand that this is also a Wordpress issue, but if this can be used with the changes in develop, a release would be great. Thanks (I'll note that I haven't actually checked to see if the feed itself has changed and fixed itself yet)

fchorney avatar Nov 16 '23 16:11 fchorney

The develop branch is not in a state where it can be released yet; it will take many, many hours of work to get it into a stable state, and I can't commit the required time until after the new year.

kurtmckee avatar Dec 09 '23 17:12 kurtmckee