feedparser
feedparser copied to clipboard
opening file mentioned in feed doctype
I am currently getting an "Unknown IO error" printed to stderr while using feedparser.parse('http://feeds.feedburner.com/news_trailbusterscom?format=xml')
It defines a header:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">
I have been using strace to see where it is happening and I saw a stat('http://my.netscape.com/publish/formats/rss-0.91.dtd')
call for the doctype from the xml header. I tried to parse a feed with doctype changed to file://etc/hosts
and strace have disclosed a successful stat()
and open()
for the file a filled in the doctype url.
This behaviour seems a little bit suspicious to me. Allowing user input to open a file in the system is not much pretty. Is this OK?
No, this behavior is not okay and actually it is pretty serious. Perhaps feedparser should use defusedxml, which wraps a number of Python XML libraries to prevent this stuff, and has nice explanations of these vulnerabilities:
@johniez thanks for reporting this!
@twm, great suggestion! I'd like feedparser to be far more stable and secure than it is, so this may be a necessary change to protect users! I'll look into it as soon as I can!