feedparser
feedparser copied to clipboard
Failed to parse description field with escaped CDATA.
Bug Description: Up to the current version (2024-04-12), if the description field contains escaped CDATA, feedparser fails to extract the content. I have simplified the issue and provided a minimal reproducible test case ( source RSS link ).
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>xueqiu</title>
<link>http://xueqiu.com/hots/topic</link>
<description>xiuqiu</description>
<item>
<title>title</title>
<link>http://xueqiu.com/1630191122/288006046</link>
<description><![CDATA[some text]]></description>
<pubDate>Sat, 27 Apr 2024 08:26:02 GMT</pubDate>
<guid>http://xueqiu.com/1630191122/288006046</guid>
<dc:creator>name</dc:creator>
<dc:date>2024-04-27T08:26:02Z</dc:date>
</item>
</channel>
</rss>
Expectation:
feed.entries[0].description=='some text'
, but the actual result is an empty string.
If <![CDATA[some text]]>
is changed to <![CDATA[some text]]>
, then it works fine.