boxo icon indicating copy to clipboard operation
boxo copied to clipboard

Gateway: Cache-Control, max-age=ttl, ETag for /ipns requests

Open ion1 opened this issue 10 years ago • 18 comments

The gateway does not seem to provide a Cache-Control/max-age header or an ETag for /ipns URLs.

It might be useful to assume some reasonable non-zero max-age for content on IPNS. A minute? Fifteen seconds?

It would also be good to serve an ETag corresponding to whatever IPFS hash the IPNS path resolved to.

ion1 avatar Oct 09 '15 18:10 ion1

It might be useful to assume some reasonable non-zero max-age for content on IPNS. A minute? Fifteen seconds?

It could be determined by the IPNS record -- some will have a calculable TTL.

jbenet avatar Oct 09 '15 19:10 jbenet

Until then, we could set must-revalidate and a quasi infinite max-age

ghost avatar Oct 09 '15 19:10 ghost

I’m not sure if I have understood this correctly, but I’m under the impression that must-revalidate only applies if the content has already expired. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4

ion1 avatar Oct 09 '15 20:10 ion1

You're right, thanks -- so that means must-revalidate + max-age=0 = cache-indefinitely-unless-updated

ghost avatar Oct 09 '15 20:10 ghost

Note that for /ipfs/, we can still set a quasi-infinite max-age.

ghost avatar Oct 09 '15 20:10 ghost

It seems to me that must-revalidate means that given an object that expired according to max-age, a cache must return an error instead of a stale copy if it fails to connect to the origin server. I’m not sure must-revalidate is of any use here. Unless I’m misinterpreting it, of course.

ion1 avatar Oct 09 '15 22:10 ion1

DNS lets you say explicitly for how long a record can be cached. Perhaps we should do that with IPNS names? ipfs name publish --ttl=3h which would be taken advantage of by IPFS nodes who have resolved the name as well as by the gateway in the Cache-Control header.

ion1 avatar Oct 14 '15 15:10 ion1

It was mentioned on IRC that resolve does not provide TTL information to be used for max-age.

resolveOnce could be changed to return a TTL value. resolve could start at infinity-ish and compute the minimum of the previous TTL and what resolveOnce returned at each iteration.

Given that the DNS lookup functions in the net package for Go do not seem to provide TTL information at the moment, the DNS resolveOnce could return a reasonable guess like one minute. If a DNS record points to /ipns/<keyhash>, that IPNS record can still decrement the TTL according to the iteration above if a low cache time is desired.

If resolve returned a TTL, the if strings.HasPrefix(urlPath, ipfsPathPrefix) check in gateway_handler.go could go away and the header be set unconditionally.

ion1 avatar Oct 28 '15 21:10 ion1

Would this be a reasonable starting point?

diff --git a/namesys/interface.go b/namesys/interface.go
index 09c296c..b0b740d 100644
--- a/namesys/interface.go
+++ b/namesys/interface.go
@@ -47,6 +47,14 @@ const (
    // trust resolution to eventually complete and can't put an upper
    // limit on how many steps it will take.
    UnlimitedDepth = 0
+
+   // ImmutableTTL is the time-to-live value to be reported for immutable
+   // objects.
+   ImmutableTTL = 10 * 365 * 24 * time.Hour
+
+   // UnknownTTL is the time-to-live value to be reported for mutable
+   // paths when accurate information is not available.
+   UnknownTTL = time.Minute
 )

 // ErrResolveFailed signals an error when attempting to resolve.
@@ -88,7 +96,10 @@ type Resolver interface {
    // There is a default depth-limit to avoid infinite recursion.  Most
    // users will be fine with this default limit, but if you need to
    // adjust the limit you can use ResolveN.
-   Resolve(ctx context.Context, name string) (value path.Path, err error)
+   //
+   // Resolve also returns a time-to-live value which indicates the
+   // maximum amount of time the result may be cached.
+   Resolve(ctx context.Context, name string) (value path.Path, ttl time.Duration, err error)

    // ResolveN performs a recursive lookup, returning the dereferenced
    // path.  The only difference from Resolve is that the depth limit
@@ -97,7 +108,7 @@ type Resolver interface {
    //
    // Most users should use Resolve, since the default limit works well
    // in most real-world situations.
-   ResolveN(ctx context.Context, name string, depth int) (value path.Path, err error)
+   ResolveN(ctx context.Context, name string, depth int) (value path.Path, ttl time.Duration, err error)
 }

 // Publisher is an object capable of publishing particular names.
diff --git a/namesys/base.go b/namesys/base.go
index e552fce..29ef997 100644
--- a/namesys/base.go
+++ b/namesys/base.go
@@ -2,6 +2,7 @@ package namesys

 import (
    "strings"
+   "time"

    context "github.com/ipfs/go-ipfs/Godeps/_workspace/src/golang.org/x/net/context"

@@ -9,27 +10,36 @@ import (
 )

 type resolver interface {
-   // resolveOnce looks up a name once (without recursion).
-   resolveOnce(ctx context.Context, name string) (value path.Path, err error)
+   // resolveOnce looks up a name once (without recursion).  It also
+   // returns a time-to-live value which indicates the maximum amount of
+   // time the result may be cached.
+   resolveOnce(ctx context.Context, name string) (value path.Path, ttl time.Duration, err error)
 }

 // resolve is a helper for implementing Resolver.ResolveN using resolveOnce.
-func resolve(ctx context.Context, r resolver, name string, depth int, prefixes ...string) (path.Path, error) {
+func resolve(ctx context.Context, r resolver, name string, depth int, prefixes ...string) (path.Path, time.Duration, error) {
+   // Start with a long TTL.
+   ttl := ImmutableTTL
+
    for {
-       p, err := r.resolveOnce(ctx, name)
+       p, ttlOnce, err := r.resolveOnce(ctx, name)
+       // Use the lowest TTL reported by the resolveOnce invocations.
+       if ttlOnce < ttl {
+           ttl = ttlOnce
+       }
        if err != nil {
            log.Warningf("Could not resolve %s", name)
-           return "", err
+           return "", ttl, err
        }
-       log.Debugf("Resolved %s to %s", name, p.String())
+       log.Debugf("Resolved %s to %s (TTL %v -> %v)", name, p.String(), ttlOnce, ttl)

        if strings.HasPrefix(p.String(), "/ipfs/") {
            // we've bottomed out with an IPFS path
-           return p, nil
+           return p, ttl, nil
        }

        if depth == 1 {
-           return p, ErrResolveRecursion
+           return p, ttl, ErrResolveRecursion
        }

        matched := false
@@ -44,7 +54,7 @@ func resolve(ctx context.Context, r resolver, name string, depth int, prefixes .
        }

        if !matched {
-           return p, nil
+           return p, ttl, nil
        }

        if depth > 1 {

ion1 avatar Oct 29 '15 18:10 ion1

@ion1 yeah, we could return a ttl with every resolution... I'm generally against more return types than 1+error, but this is likely a cleaner approach than others

whyrusleeping avatar Oct 29 '15 19:10 whyrusleeping

<jbenet> […] i'm overall -1 on returning the TTL on every Resolve call. i think that should be a separate function call.
<jbenet> most uses of Resolve do not care.
<jbenet> and its important to keep the interface as clean as we can possible have it.
<ion> jbenet: Fair enough. So it would be better to add a new function to the Resolver interface?
<jbenet> ion: yeah a new function SGTM.

I’m going to do that then.

ion1 avatar Oct 31 '15 01:10 ion1

Just for the record, this is still an open issue for /ipns/ responses returned by go-ipfs v0.4.18.

2019-01-02--14-08-56

Etag is present, but we are missing an explicit IPNS expiration header.

AFAIK it could be either:

  • Cache-Control header with max-age
  • or Expires header with a date

Why? There is a Last-Modified header so most browsers will use "Heuristic Freshness" to determine how long to cache that asset for. This is a weak cache hint, as RFC7234 suggests something like (current time - last modified time) / 10, however, there are subtle differences between Firefox and Chrome.

Due to this we should provide explicit cache expiration headers for /ipns/ to unify experience across all vendors.

lidel avatar Jan 02 '19 13:01 lidel

This will sort-of depend on https://github.com/ipfs/go-ipfs/issues/5884, as ipns supports dns resolution too. 'Real' IPNS records support TTL too, but AFAIK it's currently not used (basically it's set to 0).

magik6k avatar Jan 02 '19 20:01 magik6k

actionable task extracted from https://github.com/ipfs/go-ipfs/pull/8074#pullrequestreview-645196768, seems to be a natural continuation fo this old issue here

TLDR: we are not leveraging TTLs in Gateway responses, which means things are not cached for as long as they could be. This is getting critical for our gateway infra, without this we are unable to cache things on Gateways as efficiently as we could.

What we need

  • Gateway responses for /ipns/* should return Cache-Control based on:
    • TTL from IPNS record
    • TTL of a DNSLink TXT record
  • Remove Last-Modified header (right now returns now timestamp to leverage browser heuristycs: https://github.com/ipfs/go-ipfs/pull/8074, but we can remove it only when proper Cache-Control is in place)
  • Etag based on the content path from the record. Together with Cache-control it enables cache at edge and user agents to revalidate cache instead of re-downloading data that did not change and which they already have. We already do this for files and dirs, just mentioning here for completeness.

Implementation notes

  • iiuc we need to extend the namesys interface to return TTL of both record types in addition to Path (as suggested earlier)
  • wire ttl up in gateway code, so cache-control header is returned
  • have tests that confirm /ipns/ content paths have proper cache controls on all gateway types: path, subdomain, and dnslink
  • We already have DNS.MaxCacheTTL in Kubo config, adding Ipns.MaxCacheTTL sounds sensible (to allow gateway operators force check for new IPNS record after some sane time)

cc https://github.com/ipfs/go-ipfs/issues/5884, https://github.com/ipfs/go-ipfs/issues/7564

lidel avatar Jan 18 '22 21:01 lidel

Reopening. We have spec at https://specs.ipfs.tech/http-gateways/path-gateway/#cache-control-response-header, but boxo/gateway does not implement it fully.

This is not fixed fully yet.

IPNS Records are handled correctly thanks to https://github.com/ipfs/boxo/pull/459 but we don't set Cache-Control with max-age based on TTL as noted in https://github.com/ipfs/boxo/pull/459#issuecomment-1705462860. I remember we did not have TTL there because Golang standard library is not exposing access to original TTL of DNS record when resolver from operating system is used.

Remaining work here

  • [ ] Always return Cache-Control on /ipns/{dnslink} responses. Either correctly wire up TTL from DNS TXT record of DNSLink, or use 5 minute as default fallback when it is not available.
  • [ ] port Cache-Control implementation from https://github.com/ipfs/boxo/pull/584, where value is public, max-age=<ttl>, stale-while-revalidate=<dht-or-other-expiration-window>, stale-if-error=<dht-or-other-expiration-window>
  • [ ] update gateway-conformance tests to test for presence of cache-control on DNSLink response

lidel avatar Mar 13 '24 18:03 lidel

port Cache-Control implementation from https://github.com/ipfs/boxo/pull/584, where value is public, max-age=, stale-while-revalidate=, stale-if-error=

For DNSLink, what would you recommend the stale-while-revalidate be, since it has nothing to do with the DHT?

2color avatar Mar 14 '24 10:03 2color

Good question. Unless there is a better value, I suggest using Amino DHT expiration window (48h) in cases like this, where we need a sensible hard ceiling on caching mutable things.

Rationale: DNSLink resolves to some CID. Returning a cached HTTP response body is ok as long the the data behind that CID could be (in theory) retrievable from the public swarm provider. So if we count from the moment we resolved dnslink and successfully retrieved it, 48h is the max window we can assume the retrieval of the same payload would be successful.

lidel avatar Mar 14 '24 14:03 lidel