gofeed icon indicating copy to clipboard operation
gofeed copied to clipboard

v2: Add streaming support for large feeds

Open mmcdole opened this issue 7 months ago • 3 comments

Overview

Implement true streaming support for gofeed to handle large feeds efficiently without loading entire documents into memory.

Tasks

Feed Detection

  • [ ] Rewrite feed type detection to use fixed-size buffer (e.g., 8KB)
  • [ ] Use io.MultiReader to reconstruct complete reader after detection
  • [ ] Ensure no data is lost during detection phase

Streaming Parse Methods

  • [ ] Design streaming API that returns iterator/channel of items
  • [ ] Implement for each format (RSS, Atom, JSON)
  • [ ] Add MaxItems support that actually stops reading (not just skipping)
  • [ ] Consider error handling in streaming context

Benefits

  • Handle arbitrarily large feeds without memory issues
  • Process items as they're parsed rather than after full parse
  • True MaxItems support that stops reading early
  • Better performance for large feeds

API Considerations

// Possible API designs
iter, err := parser.ParseStream(reader, opts)
for iter.Next() {
    item := iter.Item()
    // Process item
}

// Or channel-based
items, errs := parser.ParseStreamChan(reader, opts)
for item := range items {
    // Process item
}

Related Issues

  • Part of v2 RFC (#241)

mmcdole avatar May 26 '25 14:05 mmcdole

Iter based approach should probably return an iter.Seq[T] and follow the Golang iterator protocol instead of using .Next(). That way one can use it in a range expression.

Necoro avatar May 27 '25 00:05 Necoro

Hey! I really appreciate all the work you've done with gofeed. I would like to contribute here. Since there is no CONTRIBUTING.md could you tell me if there's some guidelines to follow and how to you'd like to collaborate on the specs. I can also start by adding the CONTRIBUTING.md file if you like. Thanks!

shashaBot avatar May 30 '25 23:05 shashaBot

Iter based approach should probably return an iter.Seq[T] and follow the Golang iterator protocol instead of using .Next(). That way one can use it in a range expression.

That's a good idea, however I see that there was a problem with using go version 1.23 here and iterators are only available beginning with this version. I am gonna try to look into this on the weekend

shashaBot avatar May 30 '25 23:05 shashaBot