cache_req_fsm: keep the cache object's Content-Length for HEAD always
Previously, we would only keep the Content-Length header for HEAD requests on hit-for-miss objects, now we simply keep it always to enable "fallback" caching of HEAD requests.
The added vtc implements the basics of the logic to enable the (reasonable) use case documented in https://github.com/varnishcache/varnish-cache/issues/2107#issuecomment-2536642262 but using Vary instead of cache key modification plus restart.
Fixes #4245
notes from bugwash:
- There should be a way for VCL to stop sending
C-L.
my own homework:
- understand why the current code works for pass
homework: why does the current code work?
diff --git a/bin/varnishd/cache/cache_req_fsm.c b/bin/varnishd/cache/cache_req_fsm.c
index bbcb3824f..91ec23780 100644
--- a/bin/varnishd/cache/cache_req_fsm.c
+++ b/bin/varnishd/cache/cache_req_fsm.c
@@ -493,6 +493,7 @@ cnt_transmit(struct worker *wrk, struct req *req)
* filters have had a chance to chew on it, but that
* would negate the "pass for huge objects" use case.
*/
+ VSLb(req->vsl, SLT_Debug, "HEAD with OC_F_HFM");
} else {
http_Unset(req->resp, H_Content_Length);
if (req->resp_len >= 0)
$ ./varnishtest -iv tests/b00065.vtc | grep -C 5 'HEAD wi'
**** v1 vsl| 1004 RespHeader c Via: 1.1 v1 (Varnish/trunk)
**** v1 vsl| 1004 VCL_call c DELIVER
**** v1 vsl| 1004 VCL_return c deliver
**** v1 vsl| 1004 Timestamp c Process: 1738839586.533446 0.014269 0.000088
**** v1 vsl| 1004 Filters c
**** v1 vsl| 1004 Debug c HEAD with OC_F_HFM
**** v1 vsl| 1004 RespHeader c Connection: keep-alive
**** v1 vsl| 1004 Timestamp c Resp: 1738839586.533676 0.014500 0.000230
**** v1 vsl| 1004 ReqAcct c 53 0 53 165 0 165
**** v1 vsl| 1004 End c
**** v1 vsl| 1003 SessClose c REM_CLOSE 0.016
so the answer is: Because we set OC_F_HFM for passes.
I the following, I refer to a response which, by definition, does not have an HTTP body (CONNECT or HEAD request and any response with a 1xx (Informational), 204 (No Content), or 304 (Not Modified) status code) as without body, and all others as with body, even if the body may be empty.
* There should be a way for VCL to stop sending `C-L`.
I have pondered the how and worked on an implementation, and for now, I am unhappy with what I have: unset resp.body seems wrong. It would mean for responses with body to send a Content-Length: 0, while for responses without body it would mean to clear the Content-Length: 0 header. This really is un-pola.
Coming up with a better alternative is surprisingly hard, because, for most cases, we either create or recreate the Content-Length header when VCL has already finished. Also, trying to find a way to give VCL control over Content-Length gets messy soon because of streaming...
Hence, I lean towards a very simple solution to make the change after VCL has finished:
Add a "removeCL" filter, which
- for responses without body just removes
Content-Length - for responses with body removes
Content-Lengthand prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.
This implies that filters will also need to run for responses without a body, but, at least from my perspective, this is already an overdue change for consistency.
The vcl interface would be simple, for example:
sub vcl_deliver {
set resp.filters += " removeCL";
}
@dridi I guess you might have opinions, do you?
Also, trying to find a way to give VCL control over
Content-Lengthgets messy soon because of streaming...
We specifically made content-length and transfer-encoding read-only headers because of their role in HTTP framing (especially HTTP/1.x). So it shouldn't be direct control.
Add a "removeCL" filter
This is interesting, but I don't really understand what you are proposing.
for responses without body just removes
Content-Lengthfor responses with body removes
Content-Lengthand prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.
In both cases we end up without a body delivery, so this really looks like a case for unset resp.body (consistent with unset bereq.body). It would be both a simpler interface and simpler behavior to explain:
- discard the response body if there is one
- discard framing headers (both content-length and transfer-encoding)
We may want to also discard content-encoding if there was a body.
This implies that filters will also need to run for responses without a body, but, at least from my perspective, this is already an overdue change for consistency.
No opinion, I haven't given much thought, but I am sensitive to the consistency argument. I'm pretty sure that filters today already fiddle with headers, so having headers-only filters make sense (making the VDP::bytes() and VFP::pull() callbacks optional).
I'm not a big fan of header manipulation in core or VMOD code, but I can several cases in favor of this kind of "rendez-vous point" ability to tweak headers before/after a delivery/fetch. Essentially cases where VCL syntax is too limited.
Re @dridi
Also, trying to find a way to give VCL control over
Content-Lengthgets messy soon because of streaming...We specifically made content-length and transfer-encoding read-only headers because of their role in HTTP framing (especially HTTP/1.x). So it shouldn't be direct control.
Yes, I understand what we did and why, but still this might not have been the best solution to the problem.
This ticket is about the response to HEAD requests with respect to the Content-Length returned to clients. We have the following different cases to consider:
-
Stable cache object with known length: The
Content-Lengthheader stored with the cache object might still be wrong, and we correct it based on the actual length of the cached body data. -
Busy object (streaming) with
Content-Length: The header might turn out to be wrong, but when VCL runs, it is the best approximation we have. -
Busy object with chunked encoding: We have no
Content-Lengthat all.
If we wanted to have a correct Content-Length for the busy cases (2) and (3), we could wait for the object to be completely received, but that would not work with transit_buffer.
So, for the case of the HEAD request, we can send a "probably correct" Content-Length with (1), a "maybe correct" Content-Length with (2), and no Content-Length with (3). Which is what this PR does in its current form.
Now the bugwash decision was that VCL should have a way to prevent sending Content-Length with a response to HEAD, and the current question is HOW.
In order to not repeat myself, please re-read https://github.com/varnishcache/varnish-cache/pull/4247#issuecomment-2640309332 with the above in mind. The problem is that unset resp.body is, I think, just wrong for the case of "prevent sending Content-Length in response to HEAD".
Add a "removeCL" filter
This is interesting, but I don't really understand what you are proposing.
- for responses without body just removes
Content-Length- for responses with body removes
Content-Lengthand prevents a body to be sent, that is, for chunked encoding, it also sends an end chunk.In both cases we end up without a body delivery, so this really looks like a case for
unset resp.body(consistent withunset bereq.body). It would be both a simpler interface and simpler behavior to explain:
It's not the same, because for unset resp.body we would preferably send Content-Length: 0 in response to a GET.
While @dridi and me should find concensus, I think this PR could be merged. The question how to avoid sending Content-Length is I think separate enough.
The vtc I had added was doing all kinds of things, but not actually test for the Content-Length. Sorry for that. Fixed now in force-push
bugwash: the "delete C-L" option should exist Would be good to finish the discussion with @dridi
@dridi ?
OKed by bugwash