reader
reader copied to clipboard
A Python feed reader library.
reader treats all bozo feeds as errors, even if the loose parser managed to parse them: ```xml title 2021-12-18T11:00:00 http://example.com/ http://example.com/entry 2021-07-29T00:00:00 ' & > “ < " ” ’...
Seems like a feedparser issue: https://github.com/kurtmckee/feedparser/issues/59 Repro: ```python import reader, io, feedparser feed_bytes = b"""\ one summary-one content-one two content-two summary-two """ parser = reader._parser.default_parser().get_parser_by_mime_type('application/atom+xml') feed, entries = parser('url', io.BytesIO(feed_bytes))...
https://pypi.org/project/defusedxml/ Related issue: https://github.com/kurtmckee/feedparser/issues/107 We have two (obvious) options here: * contribute to feedparser, see the issue linked above (I don't know if it's easy to do) * pass the...
Examples: * https://python-patterns.guide/ * https://danluu.com/ (for #239) It should be relatively easy to have a retriever/parser pair that handles URLs like (newlines added for clarity): ``` magic+ http://example.com/page.html? magic-entries=& magic-content=...
JSON feed content is not sanitized, and it's not obvious from [the documentation](https://reader.readthedocs.io/en/latest/guide.html#advanced-feedparser-features) either, but has big security implications. Perhaps it's a good idea to finally re-implement content sanitization outside...
User guide: * [x] Replace `list(reader.get_entries())[:2]` with `reader.get_entries(limit=2)` throughout. * [ ] (maybe) Add guidance (on naming etc.) for plugin authors. * [ ] User-added entries (#239). Read me: *...
* pictures in https://mcfunley.com/manual-delivery * links in https://rachelbythebay.com/w/2019/08/04/olddocs/
Goal: end up with something similar to https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/architecture.md The point is to allow another *motivated* developer to add/test features, and take roughly the same decisions I would. Related to #60.
Hello ! Thank for your work. I was wondering how one could use your library to parse feeds using directly str. I mean, I can pass any str like atom,...
Example: https://simonwillison.net/atom/everything/ has 2 "sub-feeds", https://simonwillison.net/atom/entries/ and https://simonwillison.net/atom/links/. I want to either: * keep only one of them * have them as separate feeds (so I can add different tags...