gofeed icon indicating copy to clipboard operation
gofeed copied to clipboard

RFC: gofeed v2 – Proposed Changes

Open mmcdole opened this issue 7 months ago • 4 comments

Gofeed v2 - Proposed Changes & Implementation Progress

Trying to pull together some thoughts across a number of issues for a v2 of gofeed. This document outlines some ideas to make the library better, tackling current limits, making data access easier, and giving the API a refresh. Since it's a v2, breaking changes are on the table if they lead to a better overall experience.

The aim is to focus on the following core themes:

  1. Enhanced Data Access & Preservation: Ensuring users can access all data present in a feed, including format-specific fields, the original parsed structure, and relevant HTTP response metadata.
  2. Comprehensive & Unified Custom/Extension Element Handling: Robustly parsing and exposing all XML/JSON elements, including those not part of standard feed specifications or common extensions, in a structured and navigable way.
  3. Improved Parser Configuration & Control: Providing users with more granular control over the parsing process, including performance tuning, strictness, HTTP request parameters, and conditional fetching.

Implementation Checklist

Core API Changes

Related issues: #244, #251, #205, #82, #246, #229, #235, #228

  • [x] Remove Item.Custom field, enhance Extensions for structured data
  • [x] Implement ParseOptions foundation with RequestOptions sub-struct
  • [x] Context-first ParseURL API - removed ParseURLWithContext
  • [x] Expose original format-specific feed data via KeepOriginalFeed

Parser Improvements

Related issues: #248, #250

  • [x] Update format-specific parsers with public constructors
  • [ ] Add strictness and robustness controls
  • [ ] ParseDates toggle implementation

Streaming Support

Related issues: #256

  • [ ] Update feed detection to use fixed buffer instead of reading entire file
  • [ ] Implement streaming parse methods that return an iterator/channel of items
  • [ ] Add MaxItems support that actually stops reading (not just skipping)

Network & HTTP

Related issues: #247, #111, #165

  • [ ] HTTP response metadata (ETag, Last-Modified, Cache-Control)
  • [ ] Conditional request support (If-None-Match, If-Modified-Since)
  • [ ] Rate limiting support via Retry-After header
  • [ ] Custom HTTP client configuration

Architecture

Related issues: #249, #255

  • [ ] Refactor translator interfaces for type safety
  • [ ] Comprehensive error handling system with typed errors

Dependencies & Module Structure

Related issues: #128, #254

  • [x] Keep ftest in main module based on community feedback
  • [x] Remove unnecessary dependencies (json-iterator, goquery)

Key Design Decisions

1. ParseOptions Structure

Implemented as a single struct with sub-structs for organization:

type ParseOptions struct {
    // Core parsing options
    KeepOriginalFeed bool
    ParseDates bool
    StrictnessOptions StrictnessOptions
    
    // HTTP request configuration
    RequestOptions RequestOptions {
        UserAgent string
        Timeout time.Duration
        IfNoneMatch string      // For conditional requests
        IfModifiedSince time.Time
        Client *http.Client
        AuthConfig *Auth
    }
}

All parsing methods accept *ParseOptions which can be nil for defaults. Decided against variadic options for simplicity.

2. Extension System

The new extension system replaces the limited map[string]string with a structured approach:

// Access custom elements
weight := item.GetCustomValue("weight")

// Access with attributes
ext := item.GetExtension("_custom", "priority")
if len(ext) > 0 {
    value := ext[0].Value
    level := ext[0].Attrs["level"]
}

Non-namespaced elements in RSS/Atom are stored under the "_custom" namespace to avoid conflicts.

3. API Consistency

All parse methods now have consistent signatures:

  • Parse(reader io.Reader, opts *ParseOptions) (*Feed, error)
  • ParseString(str string, opts *ParseOptions) (*Feed, error)
  • ParseURL(ctx context.Context, url string, opts *ParseOptions) (*Feed, error)

Context is required for ParseURL to follow modern Go practices.

4. Streaming Considerations

Current detection loads entire feed into memory. Plan is to:

  1. Use fixed-size buffer (e.g., 8KB) for type detection
  2. Reconstruct complete reader using io.MultiReader
  3. Enable true streaming for large feeds
  4. Implement iterator pattern for processing items without loading all into memory
  5. Add MaxItems support that actually stops reading when limit reached

5. Error Handling Philosophy

Moving toward typed errors with context:

  • Parse location (line/column when available)
  • Field that caused the error
  • Strictness-aware (warnings in lenient mode become errors in strict mode)
  • Categories: Parse, Validation, Network, Format, Extension errors

Migration Guide

Key breaking changes for v2:

// Old (v1)
parser.UserAgent = "MyApp"
feed, _ := parser.Parse(reader)
feed, _ := parser.ParseURL(url)
value := item.Custom["field"]

// New (v2)
opts := &gofeed.ParseOptions{
    RequestOptions: gofeed.RequestOptions{
        UserAgent: "MyApp",
    },
}
feed, _ := parser.Parse(reader, opts)  // or nil
feed, _ := parser.ParseURL(context.Background(), url, opts)
value := item.GetCustomValue("field")

Feedback

Please give feedback or suggested changes to the plan for gofeed v2.

@infogulch @cristoper @spacecowboy and others, if you have suggestions for other changes, or comments about the above, let me know.

mmcdole avatar May 25 '25 16:05 mmcdole

I also, as of right now, only have the ParseURL functions on gofeed.Parser. At least in this v2 cut, I haven't extended network fetching of feeds to the feed specific parsers. I'm curious how people feel about that.

mmcdole avatar May 25 '25 16:05 mmcdole

Sounds amazing 🎉

At least in this v2 cut, I haven't extended network fetching of feeds to the feed specific parsers. I'm curious how people feel about that.

Feeder only uses gofeed to parse the request bodies. It uses Kotlin libraries to do the actual HTTP fetching.

spacecowboy avatar May 25 '25 17:05 spacecowboy

Update: Reverted the ftest module separation (commit 48d44a8).

Turns out Go already handles this well - unused dependencies don't get downloaded when you import the library. So keeping ftest in the main module is simpler and works just as well.

Thanks @infogulch for the feedback on #254!

mmcdole avatar May 26 '25 16:05 mmcdole

Looks like you got some fire under you this weekend! I've glanced through everything and I like the overall direction. My life is pretty busy at the moment, but I'll try to keep up with your PRs to provide feedback where it may be helpful. Thank you for your efforts!

infogulch avatar May 26 '25 17:05 infogulch