varnish-cache Partial support for Cache-Control's no-cache directive

This is the third iteration of this change, following a first iteration in #3992 that I spontaneously replaced with a second iteration in #4032 that received very helpful reviews.

In this new installment I took the design from #4032 and moved one more step in that direction.

The first important change is the strict semantics of the OC_F_BUSY flags. Its main purpose is now to inform the decision to disemnbark a request into an objhead waiting list. It still applies in other places to identify objects meant to be looked up for which the caching outcome is not known yet.

As a result, this flag is only ever raised in hsh_insert_busyobj().

In a couple places, this flag was checked as a means of coordination between a client and a fetch tasks. This is the role of a boc to coordinate these tasks, so the boc grew the responsibility of (indirectly) clearing the OC_F_BUSY flag in the fetch task.

In that regard, there are now stronger guarantees around this flag:

it only applies to potentially shareable objects (excluding req.body, synthetic and private objects)
there are four general paths that clear the flag
- when the beresp is ready for streaming (if beresp.do_stream)
- when the beresp fetch completes (if ! beresp.do_stream)
- if the fetch fails before the step implied by beresp.do_stream
- if the fetch task is never created (the busy objcore is withdrawn)
clearing the flag, and only that, can trigger a rush
- with the notable exception of HSH_Insert(), not involving a fetch
the rush policy is derived from the objcore flags
- the new policy for shareable objects is to wake up all requests

The convergence of all paths towards consistently dropping the OC_F_BUSY flag as soon as there is a caching outcome or lack thereof allows a finer grained waiting list activity (much less spurious wake ups) with lower latency.

This should result in overall less locking on objheads.

A request rushed reembarks the lookup step with the objcore that just dropped its busy flag. This means that we can perform a cache hit before entering the large lookup critical section if the object is shareable and matches the request. Even if the objcore was cached stale, it can now be served to all waiting clients. This enables a response-wide no-cache behavior (by opposition to per-header) that is currently prevented by the built-in VCL, but now in the realm of the possible.

A little bit of trivia: the exact same test case covering the partial no-cache support exists in all iterations of this change.

Being able to serve stale (but valid) waiting list hits solves the serialization problem for the most part. It shifts the problem towards incompatible variants, which creates spurious wake ups that are compensated by all the ones that were eliminated. It makes the problem much worse when there are a lot of variants, for example with a Vary: User-Agent header (that should be a reason to make them uncacheable).

In that case the vary_notice parameter can help diagnose such scenarios. A new vary_limit parameter could also be added as a mitigation, to result in pass transactions above that threshold. This is outside of the scope of this patch series, since this problem already exists with very similar detrimental effects.

Mar 05 '24 11:03 dridi

There was one waiting list quirk I was aware of that I completely forgot: the safety net for vcl_backend_error.

Since the default ttl+grace+keep is zero upon entering vcl_backend_error, we currently inject an artificial life time for the synthetic object to allow the waiting list to get a chance to see it. With this patch series, this is no longer needed, because the waiting list can now process a lack of life time for a shareable object (aka cache-control: no-cache).

It is debatable whether we want to keep the artificial life time: on one hand it buffers backend errors for a little while, on the other hand it prevents new fetch attempts for a little while for epiphenomenonal backend errors.

Either way, the waiting list is no longer a problem, see 5bfcd0f5857945aa7085aa68c3a934e980bda45d.

Mar 19 '24 13:03 dridi

#4085 added a minor merge conflict, I will try to deal with it before bugwash.

Apr 08 '24 12:04 dridi

After a first offline review with @bsdphk there are several things to change:

keep an exponential rush even for cacheable objcores
be less formal on [re]validation through waiting list hits
maybe keep the vcl_backend_error safety net

The third point is not a strict requirement, up for debate. If we keep the the safety net, we should prune its keep period (@AlveElde noticed that the safety net goes higher than the default value for shortlived, defeating its purpose in this scenario).

Apr 09 '24 19:04 dridi

Patch series updated as per @bsdphk's review:

exponential rush is back (see b0f36e1dfd + 683bf6eb30)
less formal wording (see 683bf6eb30)
safety net still present for @nigoroll's review

Only two commits changed, the aforementioned ones.

May 06 '24 08:05 dridi

I'm looking at the waiting list coverage that is failing in CI, I was able to reproduce it locally with some load.

May 06 '24 09:05 dridi

This might be a bug in varnishtest that I don't understand yet. When a timeout happens, the server spec ends before the final txresp for no apparent reason (in the logs).

I made the following change:

--- bin/varnishtest/tests/r02422.vtc
+++ bin/varnishtest/tests/r02422.vtc
@@ -7,7 +7,7 @@ server s1 {
 
        rxreq
        # wait until the new version is ready
-       delay 1
+       loop 10 {delay 0.1}
        txresp -hdr "Etag: 6"
 } -start

And the logs for s1 look like this:

**** s1    rxhdr|GET / HTTP/1.1\r
**** s1    rxhdr|Host: 127.0.0.1\r
**** s1    rxhdr|User-Agent: c4\r
**** s1    rxhdr|X-Forwarded-For: 127.0.0.1\r
**** s1    rxhdr|Via: 1.1 v1 (Varnish/trunk)\r
**** s1    rxhdr|Accept-Encoding: gzip\r
**** s1    rxhdr|X-Varnish: 1014\r
**** s1    rxhdr|\r
**** s1    rxhdrlen = 148
**** s1    http[ 0] |GET
**** s1    http[ 1] |/
**** s1    http[ 2] |HTTP/1.1
**** s1    http[ 3] |Host: 127.0.0.1
**** s1    http[ 4] |User-Agent: c4
**** s1    http[ 5] |X-Forwarded-For: 127.0.0.1
**** s1    http[ 6] |Via: 1.1 v1 (Varnish/trunk)
**** s1    http[ 7] |Accept-Encoding: gzip
**** s1    http[ 8] |X-Varnish: 1014
**** s1    bodylen = 0
**   s1    === loop 10 {delay 0.1}
**** s1    Loop #0
**   s1    === delay 0.1
***  s1    delaying 0.1 second(s)
**** dT    1.231
**** s1    Loop #1
**   s1    === delay 0.1
***  s1    delaying 0.1 second(s)
**** dT    1.331
**** s1    Loop #2
**** s1    Loop #3
**** s1    Loop #4
**** s1    Loop #5
**** s1    Loop #6
**** s1    Loop #7
**** s1    Loop #8
**** s1    Loop #9
***  s1    shutting fd 5
**   s1    Ending

So c4 is left hanging without a response and eventually times out.

May 06 '24 10:05 dridi

I found the culprit, the test became racy and varnishtest was misleading (and I was also not paying enough attention).

May 06 '24 12:05 dridi

I added 5 patches at the beginning of the series to help future troubleshooting and close the race before making it too easy to trigger.

May 06 '24 12:05 dridi

Never mind, I only made the problem harder to reproduce on my end with better synchronization, there is wrong linked to bringing back the exponential rush for hits.

May 06 '24 12:05 dridi

The problem was that the exponential rush of hits was introduced without safeguards, allowing requests to rush each other in turn when a waiting list hit was followed by a VCL restart.

In r2422 the clients c5 and c6 would restart upon a regular hit and they are both supposed to enter c4's waiting list. However, if one of them "lost" the race against c4, it would fail a third fetch and rush the other one that in turn would fail a fetch and rush the other than meanwhile entered the waiting list. Until one of them would exceed max_restarts.

The fix introduces a fence between clients entering a waiting list and objcores rushing the waiting list. The objcores can only rush clients that entered before their initial rush.

May 06 '24 15:05 dridi

Rebased onto master to resolve minor conflict with #4109 (touching adjacent lines in cache_varnishd.h).

Jun 10 '24 06:06 dridi

Force push:

rebase against current trunk
squashme commits squashed where they belong
two more commits from the enterprise side of the fence

Dec 09 '24 12:12 dridi

bugwash:

rebase onto current master to solve conflicts
align patch series with progress made on the enterprise side
organize review with @nigoroll after May 15th

Apr 30 '25 13:04 dridi

For context, dridi's slides on the topic are here: https://github.com/varnishcache/varnish-cache/wiki/VDD23Q1#compliance

Apr 30 '25 13:04 nigoroll

I'm done porting the enterprise improvements to this branch.

May 05 '25 16:05 dridi

During the review call with Dridi, I cherry-picked the first 5 harmless (flw) commits.

May 20 '25 12:05 nigoroll

I have marked all resolved conversations as such and at this point I have only the unresolved remarks left open. In general, I am OK with this PR

May 20 '25 13:05 nigoroll

Force-pushed to rebase onto current master to solve the conflict created by the commits from this branch already cherry-picked in master, and to address @nigoroll's remaining review items.

The test failures in CI for almalinux appear to be unrelated to this change.

edit: should be fixed in trunk with 94153629361e2cd8622a360cd2ddc097a1ebab23.

May 21 '25 09:05 dridi

Rebased after ae83488d909cab23f201b706a530d79a2fbc4fd9.

Jul 07 '25 08:07 dridi

Rebased onto current master (8cbf914a106e19bbec12f0fcd5c78348b2f52828) as requested in https://github.com/varnishcache/varnish-cache/pull/4365#issuecomment-3062459644.

Jul 11 '25 14:07 dridi

To get ahead, I pushed a rebase to https://github.com/nigoroll/varnish-cache/tree/partial_nocache

Jul 11 '25 16:07 nigoroll

How is that different from what I did?

Jul 11 '25 16:07 dridi

Sorry, I guess I had seen an outdated head of your branch. I am currently traveling and it might be that bad connectivity made me draw wrong conclusions.

Jul 11 '25 17:07 nigoroll

I thought I had done something wrong and I couldn't see the difference...

Jul 11 '25 18:07 dridi

I thought I had done something wrong and I couldn't see the difference...

Yes, it was my fault, sorry again.

Jul 13 '25 08:07 nigoroll

After @hermunn's review I had another one from @mbgrydeland and I pushed 8 commits that came out of our discussions, half of which were directly requested by Martin.

The actual review happened on the varnish enterprise side and we are moving forward with this change. It has seen high traffic in production without causing any problem, and while this exposure to production traffic may not offer comprehensive coverage of this change, we believe that it results in all-around better behavior of waiting lists.

Please review the additional commits and if approved, I can clean up the patch series, and then rebase it onto current master to resolve the minor merge conflict with #4362.

Aug 26 '25 10:08 dridi

Two test cases failed in the sanitizer job, both of them ran into EAGAIN in ASAN's pthread_create() wrapper, what I suggested to address in https://github.com/varnishcache/varnish-cache/pull/4352#issuecomment-3005349967.

Aug 26 '25 10:08 dridi

bugwash: OK, @dridi please fight any fires before release

(edit, bugwash verdict changed)

Aug 27 '25 13:08 nigoroll

I have force-pushed to squash the last batch of review commits, and to rebase onto current master.

I'm having a pass on the code while CI is running.

Aug 27 '25 14:08 dridi