feedparser
feedparser copied to clipboard
Parse feeds in Python
When a feed has an image and an itunes:image, the title and link are removed. #### Example: ```xml https://example.com/image.png Image Test https://example.com ``` #### Result: ``` { 'feed': { 'image':...
# Code to reproduce [feeds.tar.gz](https://github.com/kurtmckee/feedparser/files/7274713/feeds.tar.gz) ```python3 import gc import os import colorlog import psutil from concurrent import futures from feedparser import parse # from memory_profiler import profile colorlog.basicConfig(format='%(log_color)s%(asctime)s:%(levelname)s - %(message)s',...
Fixes #261 where `feed['author']` could be overridden accidentally when saving the publisher. Thanks!
**Problem:** HTML data attributes are not accepted by the sanitizer and are therefore being stripped. **Example:** This feed uses multiple data attributes in each item's HTML content: https://thewatchsite.com/forums/the-japanese-watch-discussion-forum.21/index.rss **Requested action:**...
It seems that `ConnectionResetError` is not handled when opening the feed in `parse()`: ```py try: data = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers, request_headers, result) except urllib.error.URLError as error: result.update({...
recently (since after 5.1.3) the [feed2exec](https://feed2exec.readthedocs.io/) test suite started failing because this warning was mangling the output of the script: ``` /home/anarcat/src/feed2exec/.tox/py36/lib/python3.6/site-packages/feedparser.py:345: DeprecationWarning: To avoid breaking existing software while fixing...
The "podcast" namespace is getting quite some traction these days. It would be awesome to be able to properly parse feeds that use this namespace. Details about the "podcast" namespace:...
These information seem to be largely redundant (mappings of namespace URIs to prefixes), and only the former seems to actually be used in the code (to normalise namespaces / prefixes)....
_**This will likely be a breaking change.**_ feedparser's HTML sanitizing should not rely on custom internal code anymore. Using an external package like bleach will allow feedparser to focus more...