varnish-cache Race in ESI sub-request failure mitigation

The s390 vtester exposed a race in handling of failing sub-requests, if the request fails fast enough, we send the response, for instance a 503, to the client.

On the s390, different thread scheduling makes the if (req->objcore->flags & OC_F_FAILED) test in cnt_fetch() true, sends the request to vcl_error, and esi_deliver.c just spits out that 503.

But it seems that the problem is deeper than that, as shown by this VTC, a variation of e00035 where the errors are generated internal to Varnish instead of by the backend:

varnishtest "ESI fragment fetch fail"

server s1 {
        rxreq
        expect req.url == "/abort"
        txresp -hdr {surrogate-control: content="ESI/1.0"} \
            -body {before <esi:include src="/fail"/> after}

} -start

varnish v1 -cliok "param.set feature +esi_disable_xml_check"
varnish v1 -cliok "param.set feature +esi_include_onerror"
varnish v1 -vcl+backend {
        sub vcl_backend_fetch {
                if (bereq.url == "/fail") {
                        return (error(503));
                }
        }
        sub vcl_backend_response {
                set beresp.do_esi = beresp.http.surrogate-control ~ "ESI/1.0";
                unset beresp.http.surrogate-control;
        }
} -start

client c1 {
        txreq -url "/abort"
        non_fatal
        rxresp
        expect resp.body == "before "
} -run

server s1 -wait

server s1 {
        rxreq
        expect req.url == "/continue"
        txresp -hdr {surrogate-control: content="ESI/1.0"} \
            -body {before <esi:include src="/fail" onerror="continue"/> after}

} -start

client c1 {
        fatal
        txreq -url "/continue"
        rxresp
        expect resp.body == "before  after"
} -run

It guess esi_deliver needs to scrutinize resp->status and only accept 200 ?

Nov 21 '22 22:11 bsdphk

@ThijsFeryn brought redirects up to my attention a while ago and the conclusion I had reached is that we probably want to change our built-in VCL:

sub vcl_deliver {
        if (req.esi_level > 0 && resp.status != 200 && resp.status != 204) {
                return (fail);
        }
}

My rationale is that we are stitching the representation of a resource together from the representation of fragments of the resource, so they need to be successful representations aka 200 responses (but 204 is probably fine too).

edit: changed vcl_recv to vcl_deliver as intended in the first place

Nov 23 '22 08:11 dridi

bugwash: no redundant built-in VCL handling.

Jan 30 '23 14:01 dridi