gofeed icon indicating copy to clipboard operation
gofeed copied to clipboard

v2: Implement ParseOptions and Update Parser API

Open mmcdole opened this issue 7 months ago • 4 comments

Overview

Implement the new ParseOptions structure and update all parser APIs to accept it as part of gofeed v2.

Tasks

  • [ ] Create ParseOptions struct with the following fields:

    • KeepOriginalFeed bool - Retain reference to original format-specific feed
    • StrictnessOptions - Sub-struct for parsing strictness controls
    • RequestOptions - HTTP request configuration for ParseURL
      • UserAgent string
      • Timeout time.Duration
      • IfNoneMatch string (ETag support)
      • IfModifiedSince time.Time (conditional requests)
      • Client *http.Client (custom HTTP client support)
    • Other parsing toggles (e.g., ParseDates bool)
  • [ ] Update universal parser (gofeed.Parser) methods:

    • Parse(io.Reader, *ParseOptions) (*Feed, error)
    • ParseURL(string, *ParseOptions) (*Feed, error)
    • ParseString(string, *ParseOptions) (*Feed, error)
  • [ ] Implement DefaultParseOptions() function for common use cases

Related Issues

  • #199 (MaxItems support)
  • #111 (Conditional requests)
  • #165 (Proxy/custom HTTP client support)
  • #229, #235 (KeepOriginalFeed)

Parent Issue

  • #241 (RFC: gofeed v2)

mmcdole avatar May 26 '25 04:05 mmcdole

Hey folks, I've implemented ParseOptions for v2 and wanted to get some feedback on the API.

What changed

v1:

parser := gofeed.NewParser()
parser.UserAgent = "MyApp/1.0"
parser.Client = customClient

feed, _ := parser.Parse(reader)
feed, _ := parser.ParseURL(url)
feed, _ := parser.ParseURLWithContext(url, ctx)

v2:

parser := gofeed.NewParser()

// Simple case - just pass nil
feed, _ := parser.Parse(reader, nil)
feed, _ := parser.ParseURL(ctx, url, nil)

// With options
opts := &gofeed.ParseOptions{
    MaxItems: 10,
    RequestOptions: gofeed.RequestOptions{
        UserAgent: "MyApp/1.0",
        Client: customClient,
    },
}
feed, _ := parser.Parse(reader, opts)
feed, _ := parser.ParseURL(ctx, url, opts)

Main changes:

  • Everything takes *ParseOptions now (can be nil)
  • No more ParseURLWithContext - just ParseURL with context as first param
  • Request options are nested under RequestOptions field

I considered variadic options but seemed like overkill. Thoughts on this approach?

mmcdole avatar May 26 '25 21:05 mmcdole

A single opinion: Passing a pointer to a structure implies that it is changed by the callee. I understand it is this way to have nil, but feels odd nevertheless.

To be completely honest: I like the orginal approach more, because it feels more natural and is also the Gopher way (see http.Client and Co.). It has the drawback to be harder to notice (an option argument forces one to think about options; them being "hidden" in an object can lead to them being overlooked).

An option object is especially useful if you have different call sites that all share the same (or similar) options. I don't know if it is applicable for gofeed -- users will probably have one entry point for feed parsing.

Necoro avatar May 26 '25 21:05 Necoro

Thanks for the feedback @Necoro! This is probably the most important design choice in terms of the future API "UX" and how people feel about using the library for v2. I want to get this right.

I've been wrestling with this design because we have two different types of configuration with different lifetimes:

  1. Parser-level config (set once, used for all feeds):

    • Strictness settings (how permissive to be with malformed feeds)
    • Default behaviors (e.g., whether to parse dates)
    • HTTP client configuration
  2. Per-request data (different for each feed URL):

    • ETag from previous fetch (for If-None-Match)
    • Last-Modified timestamp (for If-Modified-Since)
    • These come from previous responses and are specific to each feed

Here are the main options I'm considering:

Option A: Parser config + Per-request options (hybrid)

// Setup parser with parser-level config
parser := gofeed.NewParser(&gofeed.Config{
    StrictMode: true,
    Client: customClient,
})

// Per-request options when needed
feed, _ := parser.Parse(reader)  // most calls don't need options
feed, _ := parser.ParseURL(ctx, url, &RequestOptions{
    IfNoneMatch: "previous-etag-for-this-url",
})

Option B: Everything in ParseOptions (current PR)

parser := gofeed.NewParser()
opts := &ParseOptions{
    StrictMode: true,  // has to be set every time :(
    RequestOptions: RequestOptions{
        IfNoneMatch: "etag",
    },
}
feed, _ := parser.ParseURL(ctx, url, opts)

Option C: Variadic to make it truly optional

// Most common case - no options
feed, _ := parser.Parse(reader)

// When you need options
feed, _ := parser.Parse(reader, ParseOptions{
    KeepOriginalFeed: true,
})

@Necoro @infogulch @spacecowboy @cristoper let me know if you have thoughts.

mmcdole avatar May 26 '25 23:05 mmcdole

My choice would be C > A > B.

But: In the end, I think the impact to be minor. I'd wager that most consumers will have one(!) point where they parse feeds¹ -- and then it's "study documentation once, implement, never touch again", so the actual calling convention is rather unimportant.

The API surrounding dealing with the parsed feeds: that's what you are going to use all over the place (because it's likely too complex to completely wrap into an internal structure).

¹ example from my feed2imap-go: https://github.com/Necoro/feed2imap-go/blob/master/internal/feed/parse.go

Necoro avatar May 26 '25 23:05 Necoro