feedparser
feedparser copied to clipboard
Crash in feedparser 6.0.10
I noticed that the latest released version of feedparser crashes, when a CDATA section contains a C Code snippets. Here is an example on how to reproduce the issue.
- Install feedparser via
python -m pip install feedparser
- RSS XML Crash example - rss.zip
- or you could use the original feed
https://blog.trailofbits.com/feed/
- or you could use the original feed
import feedparser
with open("./rss_code_crash.xml", "r") as f:
rss_data = f.read()
rss = feedparser.parse(rss_data)
# Or just this:
#rss = feedparser.parse('https://blog.trailofbits.com/feed/')
I tested the same issue on the develop branch, but the crash does not occur their. Thanks for your support.
This is the minimum reproducible example:
<content:encoded xmlns:content="bogus">
<![CDATA[
<!h<!h<!h<
]]>
</content:encoded>
The crash is coming from within the Python standard library -- _markupbase.py
at line 134 raises an AssertionError
stating "unexpected '<' char in declaration"
.
On a side note, it appears that Trail of Bits is using Wordpress. Perhaps this is a bug that exists in Wordpress or one of the plugins in its ecosystem and could be fixed there, as well!
Is there a specific code change in the develop
branch that fixed that problem and interpret the content in a different way?
In the develop branch everything works as expected:
-
python -m pip install git+https://github.com/kurtmckee/feedparser@develop
- using your RSS sample or my RSS sample as input
- executing the python script from above and everything works fine
That was why I thought it is a bug in feedparser. I will have a look into the Wordpress topic.
Yep, I saw the same thing with the develop
branch.
The crash is a bug in the feedparser 6.0.10 release. However, that's happening because Wordpress is failing to escape the code in its <pre>
blocks. It's two bugs, in different products, not one.
Coincidentally I was about to raise this exact same issue for the same feed. Looking forward to a fix for it
Hi, just curious if this is going to be addressed at some point. Since the issue seemed fixed in the develop branch, could we get a new release? I understand that this is also a Wordpress issue, but if this can be used with the changes in develop, a release would be great. Thanks (I'll note that I haven't actually checked to see if the feed itself has changed and fixed itself yet)
The develop branch is not in a state where it can be released yet; it will take many, many hours of work to get it into a stable state, and I can't commit the required time until after the new year.