varnish-cache Backend workspace overflow through lost header doesn't fail the response

Expected Behavior

If a workspace overflow occurs due to a lost header, the request should fail.

Current Behavior

If a workspace overflow occurs due to a lost header, deliver the response from server.

Possible Solution

No response

Steps to Reproduce (for bugs)

The expectation would be that the second request would also return a 500 (or 503).

varnishtest "Workspace overflow through LostHeader"

varnish v1 -cliok "param.set http_max_hdr 32"

server s1 -repeat 2 {
	rxreq
	txresp
} -start

varnish v1 -vcl+backend {
	sub vcl_recv {
		if (req.url == "/1") {
			set req.http.make-overflow = "some value";
		}
	}
} -start

logexpect l1 -v v1 {
	expect * 1001 LostHeader 	"make-overflow: some value"
	expect * 1001 Error 		"out of workspace"
	expect * 1001 RespStatus 	"200"
	expect * 1001 VCL_return	"deliver"
	expect * 1001 Error 		"workspace_client overflow"

	expect * 1005 Error 		"out of workspace"
	expect * 1005 LostHeader 	"Accept-Encoding"
	expect * 1005 Error 		"out of workspace"
	expect * 1005 LostHeader 	"X-Varnish: 1005"
	expect * 1005 BerespStatus	"200"
} -start

# 1. Trigger client workspace overflow through LostHeader in vcl_recv
# 2. Go to vcl_backend_fetch and cause a backend workspace overflow and
#    still return a 200
# 3. Notice a workspace overflow after vcl_deliver and give client 500 response
client c1 {
	txreq -url "/1"    -hdr "h1: 1"   -hdr "h2: 2"   \
		-hdr "h3: 3"   -hdr "h4: 4"   -hdr "h5: 5"   \
		-hdr "h6: 6"   -hdr "h7: 7"   -hdr "h8: 8"   \
		-hdr "h9: 9"   -hdr "h10: 10" -hdr "h11: 11" \
		-hdr "h12: 12" -hdr "h13: 13" -hdr "h14: 14" \
		-hdr "h15: 15" -hdr "h16: 16" -hdr "h17: 17" \
		-hdr "h18: 18" -hdr "h19: 19" -hdr "h20: 20" \
		-hdr "h21: 21" -hdr "h22: 22" -hdr "h23: 23"
	rxresp
	expect resp.status == 500
} -run

# 1. Trigger backend workspace overflow through LostHeader in vcl_backend_fetch
# 2. Get and deliver 200 from backend, send 200 to client
client c2 {
	txreq -url "/2"    -hdr "h1: 1"   -hdr "h2: 2"   \
		-hdr "h3: 3"   -hdr "h4: 4"   -hdr "h5: 5"   \
		-hdr "h6: 6"   -hdr "h7: 7"   -hdr "h8: 8"   \
		-hdr "h9: 9"   -hdr "h10: 10" -hdr "h11: 11" \
		-hdr "h12: 12" -hdr "h13: 13" -hdr "h14: 14" \
		-hdr "h15: 15" -hdr "h16: 16" -hdr "h17: 17" \
		-hdr "h18: 18" -hdr "h19: 19" -hdr "h20: 20" \
		-hdr "h21: 21" -hdr "h22: 22" -hdr "h23: 23"
	rxresp
	expect resp.status == 200
} -run

logexpect l1 -wait

Context

Hidden error occurred and failed to set a temporary request header, resulting in another part of the VCL not to trigger based on the conditional but still deliver a 200 to the client. See example below. I understand that temporary variables should most likely be moved to a VMOD and will do that but the error was completely being ignored when the content should have been considered invalid, in my case.

Something to note from the reproducing VTC, requests will go past deliver before intercepting the workspace overflow. I thought this was interesting and wondering if that should be the intended goal if the request is to fail anyway.

sub vcl_backend_fetch {
    # Header is lost
    set bereq.http.do = "something";
}

sub vcl_backend_response {
   # Filter doesn't get added
   if (bereq.http.do == "something") {
      set beresp.filters += " some_filter";
  }
}

Varnish Cache version

varnish 6+

Operating system

Ubuntu 22.04

Source of binary packages used (if any)

No response

Nov 19 '24 21:11 stevendore

late bugwash note: We see that this is a problem, and it is not intended. We want to fail the vcl for any out-of-headers / out-of-workspace condition. 👀

Nov 25 '24 17:11 nigoroll

Looking on IRC there may have been questions or uncertainties with some points in the ticket so ill try to expand some.

The VTC was run on open source branches: master, 7.5, 6.0.

The main concern was the backend workspace overflow that is not caught.

The client side going through the whole pipeline before noticing there was a workspace overflow was just something I noticed while working out the VTC. Technically it ends up with the right result but thought i'd point it out if it was wasting cycles or something unintended.

Client 1 will trigger a workspace overflow in vcl_recv that isn't intercept until after vcl_deliver is already run. This results in the 500 but later than I expected.

Client 2 will trigger a workspace overflow in backend_fetch that is ignored.

The main way this happens is through the lost header case(s). I didn't notice any other place where workspace overflow doesn't trigger a failure immediately. In almost all cases of lost header, which marks workspace overflowed, logic continues.

A tangent the error for when http->nhd >= http->shd wasn't clear the max headers amount of headers were hit.

Nov 25 '24 17:11 stevendore