reader issues

reader treats all bozo feeds as errors

1

reader treats all bozo feeds as errors, even if the loose parser managed to parse them: ```xml title 2021-12-18T11:00:00 http://example.com/ http://example.com/entry 2021-07-29T00:00:00 ' & > “ < " ” ’...

lemon24

bug

core

feed parsing

Atom summary/content is order-dependent

Seems like a feedparser issue: https://github.com/kurtmckee/feedparser/issues/59 Repro: ```python import reader, io, feedparser feed_bytes = b"""\ one summary-one content-one two content-two summary-two """ parser = reader._parser.default_parser().get_parser_by_mime_type('application/atom+xml') feed, entries = parser('url', io.BytesIO(feed_bytes))...

lemon24

bug

core

feed parsing

Consider using defusedxml

3

https://pypi.org/project/defusedxml/ Related issue: https://github.com/kurtmckee/feedparser/issues/107 We have two (obvious) options here: * contribute to feedparser, see the issue linked above (I don't know if it's easy to do) * pass the...

lemon24

core

feed parsing

Some websites don't have feeds

1

Examples: * https://python-patterns.guide/ * https://danluu.com/ (for #239) It should be relatively easy to have a retriever/parser pair that handles URLs like (newlines added for clarity): ``` magic+ http://example.com/page.html? magic-entries=& magic-content=...

lemon24

feed parsing

JSON feed content is not sanitized

JSON feed content is not sanitized, and it's not obvious from [the documentation](https://reader.readthedocs.io/en/latest/guide.html#advanced-feedparser-features) either, but has big security implications. Perhaps it's a good idea to finally re-implement content sanitization outside...

lemon24

core

feed parsing

Documentation updates

User guide: * [x] Replace `list(reader.get_entries())[:2]` with `reader.get_entries(limit=2)` throughout. * [ ] (maybe) Add guidance (on naming etc.) for plugin authors. * [ ] User-added entries (#239). Read me: *...

lemon24

documentation

Broken relative links

6

* pictures in https://mcfunley.com/manual-delivery * links in https://rachelbythebay.com/w/2019/08/04/olddocs/

lemon24

bug

web app

core

Document architecture

Goal: end up with something similar to https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/architecture.md The point is to allow another *motivated* developer to add/test features, and take roughly the same decisions I would. Related to #60.

lemon24

documentation

How to use reader to parse string directly ?

3

Hello ! Thank for your work. I was wondering how one could use your library to parse feeds using directly str. I mean, I can pass any str like atom,...

sorasful

Can't split feeds

1

Example: https://simonwillison.net/atom/everything/ has 2 "sub-feeds", https://simonwillison.net/atom/entries/ and https://simonwillison.net/atom/links/. I want to either: * keep only one of them * have them as separate feeds (so I can add different tags...

lemon24

API

core

reader
reader copied to clipboard

Metadata

reader treats all bozo feeds as errors

Atom summary/content is order-dependent

Consider using defusedxml

Some websites don't have feeds

JSON feed content is not sanitized

Documentation updates

Broken relative links

Document architecture

How to use reader to parse string directly ?

Can't split feeds

← Metadata

Owner

Metadata

reader reader copied to clipboard

Metadata

← Metadata

Owner

Metadata

reader
reader copied to clipboard