fetch icon indicating copy to clipboard operation
fetch copied to clipboard

redirects and etag/if-none-match

Open wanderview opened this issue 1 year ago • 20 comments

What is the issue with the Fetch Standard?

@mrpickles and I were looking into how redirects and http caching work together recently. We noticed that redirects (301/302) don't seem to send if-none-match headers even if the redirect response contains an etag. This behavior seems consistent across chrome, firefox, and safari.

Does the spec currently reflect this? I can't seem to find where it defines this restriction. Step 25.2.2.2.1 of http-network-or-cache-fetch doesn't seem to check the status code of the stored response at all. I also don't see it in the http-caching RFC. Am I missing it somewhere or do we need to add this check to the spec?

For reference, I think this is where chromium restricts the status code:

https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_cache_transaction.cc;l=3029;drc=aeeec53b304d068aa96fe84816d1d6c9d96f454e

wanderview avatar Aug 23 '24 16:08 wanderview

Just to clarify:

Given request reqA that receives a response resA, and resA contains a redirection status code and Location header pointing to B. You seem to be expecting reqB to contain If-None-Match header fields, but it's not clear where the Etags in them were obtained: is it from a stored response from B, or from resA?

If it's a stored response from B, I agree that the request should contain a conditional (although it is not required to by HTTP, at least). However, if it's from reqA, that's problematic; ETags obtained from A have nothing to do with B.

mnot avatar Aug 26 '24 03:08 mnot

it's not clear where the Etags in them were obtained: is it from a stored response from B, or from resA?

This is the order of operations:

  1. You visit A. The response contains an Etag header and a Location header that points to B. The response has a redirection status code.
  2. You get redirected to B. The response from B doesn't matter in this case.
  3. You visit A again. There's no If-None-Match header in the request, even though the earlier response from A had an Etag header.

So the question here is whether it's appropriate for subsequent requests to A to contain the header since a previous response had an ETag, even though the status code is a redirect.

MrPickles avatar Aug 26 '24 04:08 MrPickles

Ah, I see, thanks.

mnot avatar Aug 26 '24 04:08 mnot

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

I guess the Chromium code says it makes sense for 206 as well, but isn't a 304 always translated to a 200?

annevk avatar Aug 26 '24 06:08 annevk

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

That's my understanding, at least. (I suppose it's also plausible that the storedResponse variable not having any body contents could cause it to be considered null.)

I guess the Chromium code says it makes sense for 206 as well, but isn't a 304 always translated to a 200?

What does 304 translating to a 200 mean? From what I can tell, a 304 response causes the client to completely ignore the response body.

So 304 behavior in practice seems inconsistent among browsers. See https://cr.kungfoo.net/mrpickles/http_cache/res_304_with_200_first.php for a quick demo of 304 and ETag behavior.

  • On Chrome and Safari, the client will send If-None-Match headers on subsequent requests if the original response had a 200 and an ETag.
  • On Firefox, the client never seems to be send If-None-Match headers. I'm not sure if this is some browser setting, but it was the case for both normal and private windows.
  • On Chrome, making a request to a page that always 304's is effectively a no-op.
  • On Safari and Firefox, making a request to a page that always 304's will render the (empty) page.

MrPickles avatar Aug 26 '24 16:08 MrPickles

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

Does something like https://github.com/MrPickles/fetch/commit/92fee8d25c509b930cbe20541c470d679f6cfaac look to be on the right track? (This is my first time doing web standards stuff.) If so, I can open up a pull request.

MrPickles avatar Aug 26 '24 16:08 MrPickles

What does 304 translating to a 200 mean?

We end up replacing a 304 network response with its stored response (so that a fetch caller almost never sees a 304, unless they control the revalidating headers themselves). We currently don't assert that the stored response has a 200 status code, but I think the HTTP specifications do end up requiring that. Might need to be further clarified as well. @mnot's input on this would be appreciated.

Does something like https://github.com/MrPickles/fetch/commit/92fee8d25c509b930cbe20541c470d679f6cfaac look to be on the right track?

That does look good, modulo whether we should check for "ok status", "200", or "200 or 206".

annevk avatar Aug 26 '24 17:08 annevk

For reference, I think this is where chromium restricts the status code:

https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_cache_transaction.cc;l=3029;drc=aeeec53b304d068aa96fe84816d1d6c9d96f454e

There may be additional filtering elsewhere. From observation, the browser doesn't send If-None-Match headers if the original response returned an ETag but had a 206 status code.

Does something like https://github.com/MrPickles/fetch/commit/92fee8d25c509b930cbe20541c470d679f6cfaac look to be on the right track?

That does look good, modulo whether we should check for "ok status", "200", or "200 or 206".

Would it reasonable to start with specifying "ok status" and then tightening the spec as we gain more certainty around the intended behavior? Since step 8.25.2 currently specifies no restrictions, I'd imagine we'd want to start with the smallest reasonable "logical spec diff" and incrementally proceed from there.

MrPickles avatar Aug 28 '24 14:08 MrPickles

Typically for changes to the standard we try to apply "measure twice cut once". As per https://whatwg.org/working-mode#changes we'll also need tests and from those we'd be able to inform ourselves as to what to require and which implementations might need changes. (This isn't always possible, but I think for this change it should be.)

annevk avatar Aug 28 '24 15:08 annevk

Ack, sounds good.

I did a bit of poking around to see how the different browsers react (demo here). From what I'm seeing, the browser behaviors are inconsistent, and I haven't been able to pattern-match on what they actually do.

Let me get back to you on this once I've investigated a bit more and have a more rigorous description of each browser's behavior.

MrPickles avatar Sep 06 '24 17:09 MrPickles

Alright, so I've documented some patterns from the three browsers. This is from making subsequent requests to the same URL that always returns an ETag but returns a random response status code.

Below is the behavior of each browser if you make multiple requests to the same URL. I can't believe I'm saying this, but it's easier to describe the existing behavior using (semi-)normative language...

Chrome

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let sendIfNoneMatchHeaders be a boolean flag. By default, it is false.
  3. Let req be the HTTP request the browser is going to send.
  4. If sendIfNoneMatchHeaders is true and etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  5. Send req to the URL.
  6. Let res be the HTTP response from req.
  7. Let status be the HTTP status code from res.
  8. Let responseEtag be the contents of the ETag header from res.
  9. If status is 200, then:
    1. If etag null, then:
      1. Set etag to be responseEtag.
    2. Set sendIfNoneMatchHeaders to true.
  10. If status is not 200 or 206, then:
    1. Set sendIfNoneMatchHeaders to true.
  11. Rinse and repeat.

So the way Chrome works is that it'll start storing ETags once it gets a 200. And it'll keep sending the ETag as long as it keeps getting 200's or 206's from the same URL. If the "chain gets broken" by a non-200/206, getting another 200 response will allow the browser to continue sending the original ETag.

Firefox

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let neverSendIfNoneMatchHeadersEverAgain be a boolean flag. By default, it is false.
  3. Let req be the HTTP request the browser is going to send.
  4. If neverSendIfNoneMatchHeadersEverAgain is false and etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  5. Send req to the URL.
  6. Let res be the HTTP response from req.
  7. Let status be the HTTP status code from res.
  8. Let responseEtag be the contents of the ETag header from res.
  9. If status is 2xx and etag is null, then:
    1. Set etag to be responseEtag.
  10. If status is not 2xx, then:
    1. Set neverSendIfNoneMatchHeadersEverAgain to true.
  11. Rinse and repeat.

Firefox seems to stop sending ETags indefinitely as soon as it receives a non-2xx from the server. But otherwise, it will save the ETag from a 2xx response and keep sending the ETag back as long as it has a 2xx response.

Safari

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let req be the HTTP request the browser is going to send.
  3. If etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  4. Send req to the URL.
  5. Let res be the HTTP response from req.
  6. Let responseEtag be the contents of the ETag header from res.
  7. Let shouldClearEtag to be whether (etag is not null).
  8. Set etag to be responseEtag.
  9. If shouldClearEtag, then:
    1. Set etag to null.
  10. Rinse and repeat.

Safari is weird. It doesn't care about the response code. Rather, it seems that for every other request, it clears the ETag.

MrPickles avatar Sep 12 '24 15:09 MrPickles

The browser behaviors seem pretty inconsistent, so I'm at a loss for what the proper spec should be. Unless we keep the spec super vague, at least one browser will be violating the spec. How would you recommend proceeding?

MrPickles avatar Sep 12 '24 15:09 MrPickles

Any opinions? It seems like chrome and firefox are pretty close. We could try to sidebar about it at TPAC.

wanderview avatar Sep 13 '24 17:09 wanderview

I was chatting with @wanderview and it would seem as if what Chrome does makes the most sense (and the spec should generally match Chrome's behavior). Here are some thoughts:

  • Safari's behavior effectively ignores status code, which doesn't seem reasonable. For example, why would a browser want to cache if there's a 500?
  • Firefox never sends if-none-match headers ever again if it receives a non 2xx code, which also doesn't seem quite reasonable. A server could have an intentional 302 and then later legitimately change to a 200. But then Firefox won't cache future variants.
  • Chrome is pretty reasonable all things considered. And 200 and 206 are the only status codes that imply that there's content to cache.

I can draft a PR with the proposed spec changes and write a matching WPT. (But please let me know if the proposed spec or the approach seems unreasonable.)

MrPickles avatar Oct 07 '24 18:10 MrPickles

For example, why would a browser want to cache if there's a 500?

And 200 and 206 are the only status codes that imply that there's content to cache.

Non-200 status codes can be stored and revalidated, and are in practice. Please don't deviate from the HTTP caching spec without good reason - reinventing it causes interop issues with non-browser caches.

mnot avatar Oct 07 '24 23:10 mnot

Non-200 status codes can be stored and revalidated, and are in practice. Please don't deviate from the HTTP caching spec without good reason - reinventing it causes interop issues with non-browser caches.

Are any browsers following that spec currently here, though? See the results of testing here:

https://github.com/whatwg/fetch/issues/1770#issuecomment-2346644945

Edit: And we are open to suggestions for a path to interop here.

wanderview avatar Oct 08 '24 21:10 wanderview

I was speaking more in general - e.g., see the link to the cache tests, where browsers do cache non-200 status codes.

mnot avatar Oct 11 '24 03:10 mnot

@MrPickles apologies for the late reply, but the general approach sounds reasonable. Mark has a point though that we need to figure out the discrepancies with https://cache-tests.fyi/#status as following HTTP is a goal where that is possible.

annevk avatar Dec 06 '24 15:12 annevk

I took a look through the cache tests and spec, and from what I can tell, it doesn't actually seem to conflict with the fetch spec changes. (Note that this isn't any proposal to change any spec, but rather my observations on the existing specs and whether they'd conflict.)

If we look at the cache tests, the logic basically goes as follows:

  1. The client sends a request with Cache-Control headers set accordingly.
  2. The server gives a response with any status code.
  3. The client seconds a second request, and the server gives a 304 response.
  4. In such a case, the original response should have been cached.

The section of the fetch spec goes over the slightly different logic:

  1. The client sends a request (not necessarily with Cache-Control headers set accordingly).
  2. The server sends an ETag value back. The response status code may vary.
  3. Depending on the response status code, should the client include an If-None-Match header in subsequent requests?

The fetch spec changes discussed in this thread focus more on when the client chooses to send If-None-Match headers, whereas the caching spec is more about when the client is obligated to cache results. So they don't necessarily conflict. For example, the browser may arbitrarily choose whether or not to send If-None-Match headers, but if the HTTP response is a 304, it'll still need to use the cached response from a previous request.

So from what I can tell, tweaking behavior around the fetch spec does not have to conflict with the HTTP cache spec, since the behavior we're looking to adjust is around request headers. This is my understanding after looking around at the specs, so please double check my thought process and correct me if I'm mistaken or missing something.

Miscellaneous/tangential observations

To amend a previous quote:

And 200 and 206 are the only status codes that imply that there's content to cache.

And 200 and 206 are the only status codes that imply that there's probably interesting content to cache, such that the browser might actively want to ask the server to confirm whether it should use the cached results.

Also, from what I can tell, the "conditional requests" section of the cache tests (https://cache-tests.fyi/#conditional-inm) cover how the cache should response if If-None-Match headeres are sent, but it does not specify when the client must send those headers. So that doesn't conflict with the fetch spec discussion in this thread either.

The caching spec does bring up a fair point that all response status codes should be cache-able. So checking status code in step 8.25.2 of network-or-cache-fetch might be too broad. It may make sense to modify step 8.25.2.2.2 or around there. (This is just me musing to myself, not an actual change proposal.)

MrPickles avatar Dec 18 '24 16:12 MrPickles

I initially thought that would go against the specification (as Mark did mention revalidation above, which is what you're talking about), but https://httpwg.org/specs/rfc9111.html#rfc.section.4.3 does suggest to me that revalidation is optional (emphasis mine):

it can use the conditional request mechanism

It seems that we could decide that for the web platform that while we can cache all status codes, we're not going to validate all of them as we don't believe that to be useful or web compatible.

annevk avatar Dec 19 '24 07:12 annevk