varnish-cache
varnish-cache copied to clipboard
Do we handle 304 responses correctly?
I'm wondering if there aren't some potential problems unrelated to this patch that we have uncovered. The fact that Varnish will enter 304 revalidation logic even though we never asked for it (didn't send out any IMS/INM headers) is worrying to me. I wonder if we shouldn't keep track somehow of whether to expect a 304, and error out if we still got a 304. Could prevent a future gotcha at least.
Originally posted by @mbgrydeland in https://github.com/varnishcache/varnish-cache/issues/4013#issuecomment-1838688414
We already error out when we get an unexpected 304 response from the backend : https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/cache/cache_fetch.c#L375-L378
if (bo->stale_oc != NULL &&
ObjCheckFlag(bo->wrk, bo->stale_oc, OF_IMSCAND)) {
....
} else if (!bo->uncacheable) {
/*
* Backend sent unallowed 304
*/
VSLb(bo->vsl, SLT_Error,
"304 response but not conditional fetch");
bo->htc->doclose = SC_RX_BAD;
vbf_cleanup(bo);
return (-1);
}
I guess the discrepancy to investigate is that OF_IMSCAND doesn't track whether we sent IMS/INM on the fetch, only if we could've done that. And especially for zero-length objects, it is possible to not send IMS/INM, but still execute 304 handling if the backend responded with 304 anyways.
I'm of the opinion that if we can't see it this problem does not exist, so may I suggest starting with a new addition to vmod_debug?
.. NB: keep enum in sync with obj_attr.h
$Function INT obj_attr_size($Enum {LEN, VXID, FLAGS, GZIPBITS, LASTMODIFIED, VARY, HEADERS, ESIDATA, GZIPED, CHGCE, IMSCAND, ESIPROC})
$Restrict vcl_deliver, vcl_backend_refresh, vcl_backend_response
Give the size of an attribute or zero if it is missing for:
- ``obj`` in ``vcl_deliver``
- ``obj_stale`` in ``vcl_backend_refresh``
- ``beresp`` in ``vcl_backend_response``
This way it can also be used in boolean expressions. Once we have that we can add more test cases, or more check to existing test cases. And since we want to solve this before #3994 we would initially skip the obj_stale
inspection.
Does it make sense?
My view is that the IMS header indicates to VCL that conditional fetch is available, but nothing prevents VCL from replacing it with If-None-Match or another "better" conditional, for whatever value of "better" the VCL writer prefers.
The only dubious case that the bereq has no conditionals and a buggy backend sends 304, something I do not see as a high priority for us.
I'm not even convinced that all conditionals, also private ones, are mandated to have "If-*" as a prefix, is that written down somewhere ?
I wen through what the RFCs have to say about 304 in general and caching, and my conclusion is that as of today we are doing things mostly correctly, and probably close to the best we can do with the current management.
One big difference between the way we handle revalidation and what the RFCs have to say is the single stale object we embark in a [cond]fetch task. We're supposed to update the headers of all the stored responses matching a 304, but we only do it with the single stale response. For practical reasons, we should keep doing it.
The RFCs talk about preconditions based on weak and strong validators, but says nothing about preconditions being expressed with header fields prefixed with if-
:
Other preconditions, defined by other specifications as extension fields, might place conditions on all recipients, on the state of the target resource in general, or on a group of resources.
We support three preconditions, two of which raising OF_IMSCAND
.
Regarding validators, only two are defined by the standard, and those are the two we currently use. last-modified
is considered weak by default, but can be treated as strong by the application if the clock resolution is deemed good enough. etag
is explicitly marked as weak, or strong by default, and we already have rules to weaken a strong entity tag when we fiddle with a response body.
As of today, with revalidation handled by the core code, I think it is fair game to accept 304s only when we expected them after setting up the preconditions.
Another thing to consider is the request method, and in order to refresh a stored response there needs to be a match there as well. Considering how all cache misses are turned into GET
backend requests, we can leave this as an exercise to the VCL author to support revalidation in other contexts. I didn't look at the WebDAV spec, but it is the main well-known extension, plus the built-in VCL currently turns WebDAV traffic into pipe mode (caching WebDAV has always been an exercise left to the VCL author).
What we should consider for #3994 moving forward:
- check current preconditions in
vcl_backend_refresh
- the built-in VCL "fails" (see below) a refresh in the absence of known preconditions
- check
stale_obj
validity invcl_backend_refresh
- if the stale template object is no longer usable, "fail" the refresh (see below)
- likewise, by default we only refresh 200 responses
- when a refresh logically fails
- unset built-in preconditions (
bereq.http.{if-modified-since,if-none-match}
-
return (retry (fetch))
as suggested in #4082
- unset built-in preconditions (
Something like this for the built-in:
sub vcl_backend_refresh {
call vcl_builtin_backend_refresh;
return (merge);
}
sub vcl_builtin_backend_refresh {
call vcl_refresh_valid;
call vcl_refresh_status;
call vcl_refresh_conditions;
}
sub vcl_refresh_valid {
if (!obj_stale.is_valid) {
call vcl_refresh_error;
}
}
sub vcl_refresh_status {
if (obj_stale.status != 200) {
call vcl_refresh_error;
}
}
sub vcl_refresh_conditions {
if (!bereq.http.if-modified-since &&
!bereq.http.if-none-match) {
call vcl_refresh_error;
}
}
sub vcl_refresh_error {
unset bereq.http.if-modified-since;
unset bereq.http.if-none-match;
return (retry(fetch));
}
Anyway, as far as this ticket is concerned, I think we are fine with our current 304 handling and we can close it.
Slightly modified built-in to avoid side effects on bereq.retries
that could disturb existing VCL:
sub vcl_backend_refresh {
call vcl_builtin_backend_refresh;
return (merge);
}
sub vcl_builtin_backend_refresh {
call vcl_refresh_valid;
call vcl_refresh_status;
call vcl_refresh_conditions;
}
sub vcl_refresh_valid {
if (obj_stale.retried) { # read-only, analogous to bereq.retries, but BOOL
return (error);
}
if (!obj_stale.is_valid) {
call vcl_refresh_retry;
}
}
sub vcl_refresh_status {
if (obj_stale.status != 200) {
call vcl_refresh_retry;
}
}
sub vcl_refresh_conditions {
if (!bereq.http.if-modified-since &&
!bereq.http.if-none-match) {
return (error);
}
}
sub vcl_refresh_retry {
unset bereq.http.if-modified-since;
unset bereq.http.if-none-match;
# Same transition as return (fetch) from vcl_backend_fetch,
# but turns into an error if obj_stale.retried already true.
return (fetch);
}
sub vcl_refresh_conditions { if (!bereq.http.if-modified-since && !bereq.http.if-none-match) { call vcl_refresh_error; } }
I think this should rather be return (error);
because if we got a 304 from the backend when we didn't ask for it, then there is no point in retrying the exact same request another time and expect a different response.
I agree, I will edit my previous comment accordingly.
edit: I also renamed vcl_refresh_error
to vcl_refresh_retry