stalla
stalla copied to clipboard
Add lenient parsing mode
Right now, the parser will refuse to parse a feed that isn't perfectly adhering to specs. Alas, it happens that feeds out there can be incomplete and/or out of spec (e.g., specifying <itunes:explicit>explicit</itunes:explicit>
). I think it would be very useful if we had a lenient parsing mode, which will create a Podcast
element even if the feed is non-compliant.
This is useful both for users of the library that want to deal with the zoo of malformed feeds out there (e.g., a player app or an indexing service), and for our own validation feature (see #46). In all likelihood, we will need to create a set of "raw" models — e.g., a RawPodcast
— which contains all the data the "real" model has, but without proper typing and checks. Those raw models are read with minimal transformations from the feed itself, and can be easily validated and converted into "real" models, if they're valid.
This would allow us to:
- Read a close-to-the-ground-truth version of feeds, including incomplete, invalid ones
- Extract the validation logic from the builders, and into validators
- Transform the parsing logic to a three-step process:
- Read raw data
- Validate raw data
- Marshal valid raw data into final, properly typed models
- The final models could still be invalid (e.g., having no episodes) but they won't contain invalid data; having validation results being returned alongside the parsed model would allow users to decide whether they're ok with the feed being invalid, or attempt to remediate the issue with some custom logic
- Exposing the validation step would then be trivial, and we can also expose a full list of validation issues in case there are any (API TBD, but something like a sealed
ValidationResult
may be what we need)
The changes in both the infrastructure and the APIs to achieve this are massive, but I think it's very important for real-life usage. I realised how much I needed this when I first used this library last week for some hacking around feeds merging, last week...
Agreed 💯. We should definitely have some way of accessing invalid data for certain appliances.
Also, can integrate the idea of #46 into this.
This kind of brings me to the big question: should we get this — and thus v2.0.0 — done asap, minimising the amount of reworking we need to do across all the namespaces, or should we prioritise adding support for further namespaces?
I'd say we do this asap and have it out of the way. We can also do the cleanup of the deprecated methods early then.
Agreed!