gofeed icon indicating copy to clipboard operation
gofeed copied to clipboard

Add support for Etag, Last-Modified, If-None-Match and If-Modified-Since http headers on request and response

Open requaos opened this issue 6 years ago • 8 comments

Expected behavior

Getting the Etag and Last-Modified header values on the Feed object

Actual behavior

These values are not inspected for or provided

requaos avatar Dec 29 '18 22:12 requaos

I agree, this should be something we support.

I honestly never expected people to use ParseURL() very much, and instead expected them to make their own HTTP requests and feed the reader to gofeed.

But it seems people do want to use it, so I should make this interface more robust.

mmcdole avatar Dec 29 '18 22:12 mmcdole

I'll just leave this here:


import (
	"fmt"
	"net/http"
	"time"

	"github.com/pkg/errors"

	"github.com/mmcdole/gofeed"
)

var gmtTimeZoneLocation *time.Location

func init() {
	loc, err := time.LoadLocation("GMT")
	if err != nil {
		panic(err)
	}
	gmtTimeZoneLocation = loc
}

var ErrNotModified = errors.New("not modified")

type Reader interface {
	ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error)
}

func New(client *http.Client) Reader {
	return &reader{
		feedReader: gofeed.NewParser(),
		client:     client,
	}
}

type reader struct {
	feedReader *gofeed.Parser
	client     *http.Client
}

type Feed struct {
	*gofeed.Feed

	ETag         string
	LastModified time.Time
}

func (r *reader) ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error) {
	req, err := http.NewRequest(http.MethodGet, url, nil)
	if err != nil {
		return nil, err
	}
	req.Header.Set("User-Agent", "Gofeed/1.0")

	if etag != "" {
		req.Header.Set("If-None-Match", fmt.Sprintf(`"%s"`, etag))
	}

	req.Header.Set("If-Modified-Since", lastModified.In(gmtTimeZoneLocation).Format(time.RFC1123))

	resp, err := r.client.Do(req)

	if err != nil {
		return nil, err
	}

	if resp != nil {
		defer func() {
			ce := resp.Body.Close()
			if ce != nil {
				err = ce
			}
		}()
	}

	if resp.StatusCode == http.StatusNotModified {
		return nil, ErrNotModified
	}

	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
		return nil, gofeed.HTTPError{
			StatusCode: resp.StatusCode,
			Status:     resp.Status,
		}
	}

	feed := &Feed{}

	feedBody, err := r.feedReader.Parse(resp.Body)
	if err != nil {
		return nil, err
	}
	feed.Feed = feedBody

	if eTag := resp.Header.Get("Etag"); eTag != "" {
		feed.ETag = eTag
	}

	if lastModified := resp.Header.Get("Last-Modified"); lastModified != "" {
		parsed, err := time.ParseInLocation(time.RFC1123, lastModified, gmtTimeZoneLocation)
		if err == nil {
			feed.LastModified = parsed
		}
	}

	return feed, nil
}

requaos avatar Jan 02 '19 19:01 requaos

@mmcdole I only needed to extend the struct to add Etag and LastModified to provide this functionality

requaos avatar Jan 02 '19 19:01 requaos

I could make a PR that modifies the ParseURL function like this: ParseURL(url string, opts ...Options) And define an etag option and a lastmodified option. By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.

requaos avatar Jan 04 '19 15:01 requaos

@requaos I like the idea of the variadic parameter. I think that would be a good idea.

Some items in Options that I know people have requested:

  • ETAG

  • LastModified

  • User-Agent

  • Timeout

mmcdole avatar Jan 06 '19 14:01 mmcdole

Are there any updates on this issue yet?

aliml92 avatar May 08 '23 04:05 aliml92

By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.

I like this idea, but how would the operator retrieve the Etag/Last-Modified header values from the ParseURL api? There's nowhere to output it, so I guess you'd have to modify the Feed struct... but then that doesn't work for cases like Parse which doesn't get the whole request just the body.

I think the only option may be a new api like the idea by requaos above. https://github.com/mmcdole/gofeed/issues/111#issuecomment-450956821

infogulch avatar Aug 05 '23 06:08 infogulch