v2: Implement ParseOptions and Update Parser API
Overview
Implement the new ParseOptions structure and update all parser APIs to accept it as part of gofeed v2.
Tasks
-
[ ] Create
ParseOptionsstruct with the following fields:-
KeepOriginalFeed bool- Retain reference to original format-specific feed -
StrictnessOptions- Sub-struct for parsing strictness controls -
RequestOptions- HTTP request configuration for ParseURL-
UserAgent string -
Timeout time.Duration -
IfNoneMatch string(ETag support) -
IfModifiedSince time.Time(conditional requests) -
Client *http.Client(custom HTTP client support)
-
- Other parsing toggles (e.g.,
ParseDates bool)
-
-
[ ] Update universal parser (
gofeed.Parser) methods:-
Parse(io.Reader, *ParseOptions) (*Feed, error) -
ParseURL(string, *ParseOptions) (*Feed, error) -
ParseString(string, *ParseOptions) (*Feed, error)
-
-
[ ] Implement
DefaultParseOptions()function for common use cases
Related Issues
- #199 (MaxItems support)
- #111 (Conditional requests)
- #165 (Proxy/custom HTTP client support)
- #229, #235 (KeepOriginalFeed)
Parent Issue
- #241 (RFC: gofeed v2)
Hey folks, I've implemented ParseOptions for v2 and wanted to get some feedback on the API.
What changed
v1:
parser := gofeed.NewParser()
parser.UserAgent = "MyApp/1.0"
parser.Client = customClient
feed, _ := parser.Parse(reader)
feed, _ := parser.ParseURL(url)
feed, _ := parser.ParseURLWithContext(url, ctx)
v2:
parser := gofeed.NewParser()
// Simple case - just pass nil
feed, _ := parser.Parse(reader, nil)
feed, _ := parser.ParseURL(ctx, url, nil)
// With options
opts := &gofeed.ParseOptions{
MaxItems: 10,
RequestOptions: gofeed.RequestOptions{
UserAgent: "MyApp/1.0",
Client: customClient,
},
}
feed, _ := parser.Parse(reader, opts)
feed, _ := parser.ParseURL(ctx, url, opts)
Main changes:
- Everything takes
*ParseOptionsnow (can be nil) - No more
ParseURLWithContext- justParseURLwith context as first param - Request options are nested under
RequestOptionsfield
I considered variadic options but seemed like overkill. Thoughts on this approach?
A single opinion: Passing a pointer to a structure implies that it is changed by the callee. I understand it is this way to have nil, but feels odd nevertheless.
To be completely honest: I like the orginal approach more, because it feels more natural and is also the Gopher way (see http.Client and Co.). It has the drawback to be harder to notice (an option argument forces one to think about options; them being "hidden" in an object can lead to them being overlooked).
An option object is especially useful if you have different call sites that all share the same (or similar) options. I don't know if it is applicable for gofeed -- users will probably have one entry point for feed parsing.
Thanks for the feedback @Necoro! This is probably the most important design choice in terms of the future API "UX" and how people feel about using the library for v2. I want to get this right.
I've been wrestling with this design because we have two different types of configuration with different lifetimes:
-
Parser-level config (set once, used for all feeds):
- Strictness settings (how permissive to be with malformed feeds)
- Default behaviors (e.g., whether to parse dates)
- HTTP client configuration
-
Per-request data (different for each feed URL):
- ETag from previous fetch (for If-None-Match)
- Last-Modified timestamp (for If-Modified-Since)
- These come from previous responses and are specific to each feed
Here are the main options I'm considering:
Option A: Parser config + Per-request options (hybrid)
// Setup parser with parser-level config
parser := gofeed.NewParser(&gofeed.Config{
StrictMode: true,
Client: customClient,
})
// Per-request options when needed
feed, _ := parser.Parse(reader) // most calls don't need options
feed, _ := parser.ParseURL(ctx, url, &RequestOptions{
IfNoneMatch: "previous-etag-for-this-url",
})
Option B: Everything in ParseOptions (current PR)
parser := gofeed.NewParser()
opts := &ParseOptions{
StrictMode: true, // has to be set every time :(
RequestOptions: RequestOptions{
IfNoneMatch: "etag",
},
}
feed, _ := parser.ParseURL(ctx, url, opts)
Option C: Variadic to make it truly optional
// Most common case - no options
feed, _ := parser.Parse(reader)
// When you need options
feed, _ := parser.Parse(reader, ParseOptions{
KeepOriginalFeed: true,
})
@Necoro @infogulch @spacecowboy @cristoper let me know if you have thoughts.
My choice would be C > A > B.
But: In the end, I think the impact to be minor. I'd wager that most consumers will have one(!) point where they parse feeds¹ -- and then it's "study documentation once, implement, never touch again", so the actual calling convention is rather unimportant.
The API surrounding dealing with the parsed feeds: that's what you are going to use all over the place (because it's likely too complex to completely wrap into an internal structure).
¹ example from my feed2imap-go: https://github.com/Necoro/feed2imap-go/blob/master/internal/feed/parse.go