gofeed icon indicating copy to clipboard operation
gofeed copied to clipboard

Add support for user defined user-agent string

Open yamamushi opened this issue 7 years ago • 3 comments

Expected behavior

Parsing https://www.reddit.com/r/games/.rss should work with an appropriate delay in making requests (Reddit asks for 2 seconds between bot requests).

To further describe the issue, this could be resolved if we had the option of defining our own user-agent strings (or any headers for that matter) when calling gofeed.ParseURL(url string) or when constructing our parser with gofeed.NewParser() .

Actual behavior

Returns 429 Too Many Requests, as Reddit filters requests that do not have user-agent strings.

The first request will work, after which Reddit will block all new requests for a period of time.

Steps to reproduce the behavior

fp := gofeed.NewParser()
feed, err := fp.ParseURL("https://www.reddit.com/r/games/.rss")
if err != nil {
fmt.Println(err.Error())
return
}
// This first request will work
fmt.Println(feed.Title)

time.Sleep(5 * time.Second)

// This second request will fail because no user-agent string is defined for the request
secondfeed, err := fp.ParseURL("https://www.reddit.com/r/games/.rss")
if err != nil {
fmt.Println(err.Error())
return
}
fmt.Println(secondfeed.Title)

Note: Please include any links to problem feeds, or the feed content itself!

yamamushi avatar Jun 26 '17 16:06 yamamushi

As a workaround you could use your own transport by implementing the RoundTripper interface to set the User-Agent header, like:

type UserAgentTransport struct {
	http.RoundTripper
}

func (c *UserAgentTransport) RoundTrip(r *http.Request) (*http.Response, error) {
	r.Header.Set("User-Agent", "<platform>:<app ID>:<version string> (by /u/<reddit username>)")
	return c.RoundTripper.RoundTrip(r)
}

func main() {
	fp := gofeed.NewParser()
	fp.Client = &http.Client{
		Transport: &UserAgentTransport{http.DefaultTransport},
	}
	fp.ParseURL("https://www.reddit.com/r/games/.rss")
}

The <platform>:<app ID>:<version string> (by /u/<reddit username>) is suggested by the reddit API documentation.

bogatuadrian avatar Nov 01 '17 14:11 bogatuadrian

@bogatuadrian Thank you very much. This was really useful!

carthics avatar Mar 09 '18 16:03 carthics

#108 Should resolve this

GaruGaru avatar Oct 24 '18 13:10 GaruGaru