v2: Add streaming support for large feeds
Overview
Implement true streaming support for gofeed to handle large feeds efficiently without loading entire documents into memory.
Tasks
Feed Detection
- [ ] Rewrite feed type detection to use fixed-size buffer (e.g., 8KB)
- [ ] Use
io.MultiReaderto reconstruct complete reader after detection - [ ] Ensure no data is lost during detection phase
Streaming Parse Methods
- [ ] Design streaming API that returns iterator/channel of items
- [ ] Implement for each format (RSS, Atom, JSON)
- [ ] Add MaxItems support that actually stops reading (not just skipping)
- [ ] Consider error handling in streaming context
Benefits
- Handle arbitrarily large feeds without memory issues
- Process items as they're parsed rather than after full parse
- True MaxItems support that stops reading early
- Better performance for large feeds
API Considerations
// Possible API designs
iter, err := parser.ParseStream(reader, opts)
for iter.Next() {
item := iter.Item()
// Process item
}
// Or channel-based
items, errs := parser.ParseStreamChan(reader, opts)
for item := range items {
// Process item
}
Related Issues
- Part of v2 RFC (#241)
Iter based approach should probably return an iter.Seq[T] and follow the Golang iterator protocol instead of using .Next(). That way one can use it in a range expression.
Hey! I really appreciate all the work you've done with gofeed. I would like to contribute here. Since there is no CONTRIBUTING.md could you tell me if there's some guidelines to follow and how to you'd like to collaborate on the specs. I can also start by adding the CONTRIBUTING.md file if you like. Thanks!
Iter based approach should probably return an
iter.Seq[T]and follow the Golang iterator protocol instead of using.Next(). That way one can use it in arangeexpression.
That's a good idea, however I see that there was a problem with using go version 1.23 here and iterators are only available beginning with this version. I am gonna try to look into this on the weekend