feedparser icon indicating copy to clipboard operation
feedparser copied to clipboard

opening file mentioned in feed doctype

Open johniez opened this issue 7 years ago • 3 comments

I am currently getting an "Unknown IO error" printed to stderr while using feedparser.parse('http://feeds.feedburner.com/news_trailbusterscom?format=xml') It defines a header:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

I have been using strace to see where it is happening and I saw a stat('http://my.netscape.com/publish/formats/rss-0.91.dtd') call for the doctype from the xml header. I tried to parse a feed with doctype changed to file://etc/hosts and strace have disclosed a successful stat() and open() for the file a filled in the doctype url.

This behaviour seems a little bit suspicious to me. Allowing user input to open a file in the system is not much pretty. Is this OK?

johniez avatar Jun 21 '17 21:06 johniez

No, this behavior is not okay and actually it is pretty serious. Perhaps feedparser should use defusedxml, which wraps a number of Python XML libraries to prevent this stuff, and has nice explanations of these vulnerabilities:

twm avatar Jan 14 '18 03:01 twm

@johniez thanks for reporting this!

@twm, great suggestion! I'd like feedparser to be far more stable and secure than it is, so this may be a necessary change to protect users! I'll look into it as soon as I can!

kurtmckee avatar Apr 27 '18 20:04 kurtmckee