varnish-cache Content-Length is always 0 for content obtained from a response to a HEAD request

Expected Behavior

When the backend returns a Content-Length while replying to a HEAD request, it should be passed to the client.

Current Behavior

When the content was fetched in response to a HEAD request, the client receives Content-Length: 0.

Possible Solution

No response

Steps to Reproduce (for bugs)

Build Varnish from 772d7385dbe2cfa8b70d98af0b5da1236727e58b
Use the following VCL (see #2107):

vcl 4.1;

backend default {
    # A Debian archive mirror
    .host = "D.E.F.G";
    .port = "80";
}


sub vcl_recv {
    unset req.http.X-Fetch-Method;
    if (req.restarts == 1) {
        return (hash);
    }
}

sub vcl_hash {
    if (req.restarts == 1) {
        hash_data(req.method);
    }
}

sub vcl_miss {
    if (req.method == "HEAD") {
        if (req.restarts == 0) {
            return (restart);
        }
        set req.http.X-Fetch-Method = "HEAD";
    }
}

sub vcl_backend_fetch {
    if (bereq.http.X-Fetch-Method) {
        set bereq.method = bereq.http.X-Fetch-Method;
    }
    unset bereq.http.X-Fetch-Method;
}

Start tshark -f "host D.E.F.G" -t ad -V
Execute curl http://localhost/pub/debian/README.html -I
tshark show the request:

Hypertext Transfer Protocol                                                                                                                  
    HEAD /pub/debian/README.html HTTP/1.1\r\n

and the reply:

Hypertext Transfer Protocol                                                                                                                  
    HTTP/1.1 200 OK\r\n                                                                                                                      
        [Expert Info (Chat/Sequence): HTTP/1.1 200 OK\r\n]                                                                                   
            [HTTP/1.1 200 OK\r\n]                                                                                                            
            [Severity level: Chat]                                                                                                           
            [Group: Sequence]                                                                                                                
        Response Version: HTTP/1.1                                                                                                           
        Status Code: 200                                                                                                                     
        [Status Code Description: OK]                                                                                                        
        Response Phrase: OK                                                                                                                  
    Date: Mon, 30 Dec 2024 16:42:24 GMT\r\n                                                                                                  
    Server: Apache\r\n                                                                                                                       
    X-XSS-Protection: 1\r\n                                                                                                                  
    Content-Security-Policy: default-src 'self' *._____.__\r\n                                                                               
    X-Content-Type-Options: nosniff\r\n                                                                                                      
    Last-Modified: Sat, 09 Nov 2024 09:23:03 GMT\r\n                                                                                         
    ETag: "b67-626776a22999e-gzip"\r\n                                                                                                       
    Accept-Ranges: bytes\r\n                                                                                                                 
    Vary: Accept-Encoding\r\n                                                                                                                
    Content-Encoding: gzip\r\n                                                                                                               
    Content-Length: 967\r\n                                                                                                                  
        [Content length: 967]                                                                                                                
    Content-Type: text/html\r\n                                                                                                              
    \r\n                                                                                                                                     
    [HTTP response 1/1]                                                                                                                      
    [Time since request: 0.012166528 seconds]                                                                                                
    [Request in frame: 4]                                                                                                                    
    [Request URI: http://localhost/pub/debian/README.html]

The client receives this:

HTTP/1.1 200 OK
Date: Mon, 30 Dec 2024 16:42:24 GMT
Server: Apache
X-XSS-Protection: 1
Content-Security-Policy: default-src 'self' *._____.__
X-Content-Type-Options: nosniff
Last-Modified: Sat, 09 Nov 2024 09:23:03 GMT
ETag: "b67-626776a22999e-gzip"
Vary: Accept-Encoding
Content-Type: text/html
X-Varnish: 3
Age: 0
Via: 1.1 localhost (Varnish/trunk)
Accept-Ranges: bytes
Content-Length: 0
Connection: keep-alive

Context

Sending a Content-Length of 0 looks incorrect:

in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET.

according to https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13

It also breaks some clients such as (Go-http-client used by Packer doesn't even attempts to send a GET after the HEAD as it assumes that the file is empty).

Varnish Cache version

varnishd (varnish-trunk revision 772d7385dbe2cfa8b70d98af0b5da1236727e58b)

Operating system

Debian 12

Source of binary packages used (if any)

N/A

Dec 30 '24 16:12 sbraz

#4247 contains the necessary change to make this work. The test case there contains a different approach to implementing the same functionality avoiding the restart and cache key modification, but using Vary instead. On the way there, I also implemented a test case based on the code snippet herein, so if anyone is interested, here it is:

varnishtest "cache a HEAD as a fallback for a GET"

server s1 {
	rxreq
	expect req.method == "HEAD"
	expect req.http.t == "headmiss"
	txresp -nolen -hdr "Content-Length: 42"

	rxreq
	expect req.method == "GET"
	expect req.http.t == "getmiss"
	txresp -bodylen 42
} -start

varnish v1 -vcl+backend {
    sub vcl_recv {
	unset req.http.X-Fetch-Method;
	if (req.restarts == 1) {
	    return (hash);
	}
    }

    sub vcl_hash {
	if (req.restarts == 1) {
	    hash_data(req.method);
	}
    }

    sub vcl_miss {
	if (req.method == "HEAD") {
	    if (req.restarts == 0) {
		return (restart);
	    }
	    set req.http.X-Fetch-Method = "HEAD";
	}
    }

    sub vcl_backend_fetch {
	if (bereq.http.X-Fetch-Method) {
	    set bereq.method = bereq.http.X-Fetch-Method;
	}
	unset bereq.http.X-Fetch-Method;
    }
} -start

client c1 {
	# miss
	txreq -method "HEAD" -hdr "t: headmiss"
	rxresphdrs
	# hit
	txreq -method "HEAD" -hdr "t: headhit"
	rxresphdrs

	# miss
	txreq -hdr "t: getmiss"
	rxresp
	# hits on full object
	txreq -hdr "t: gethit"
	rxresp
	txreq -method "HEAD" -hdr "t: getheadhit"
	rxresphdrs
} -run

server s1 -wait

Again, this is only for completeness, I do NOT recommend using this approach.

Jan 02 '25 15:01 nigoroll

Thanks @nigoroll, I will test the code from #4247.

Again, this is only for completeness, I do NOT recommend using this approach.

Does it mean you recommend using the snippet with Vary? I assume it provides the same functionality as the one with restart (that a HEAD can still be satisfied from a cached GET)? If it is indeed the recommended way, would it be possible to add the snippet to the documentation somewhere? Maybe https://github.com/varnishcache/varnish-cache/blob/772d7385dbe2cfa8b70d98af0b5da1236727e58b/doc/sphinx/reference/vcl_step.rst#vcl_backend_fetch?

Jan 02 '25 15:01 sbraz

@sbraz yes I would recommend the Vary way, because it avoids the complications with restart.

I would be hesitant to add this particular and quite special use case to the core documentation, but rather add it to https://varnish-cache.org/tips/index.html, which we intend to revive.

Jan 02 '25 15:01 nigoroll