Partial support for Cache-Control's no-cache directive
This is the third iteration of this change, following a first iteration in #3992 that I spontaneously replaced with a second iteration in #4032 that received very helpful reviews.
In this new installment I took the design from #4032 and moved one more step in that direction.
The first important change is the strict semantics of the OC_F_BUSY flags. Its main purpose is now to inform the decision to disemnbark a request into an objhead waiting list. It still applies in other places to identify objects meant to be looked up for which the caching outcome is not known yet.
As a result, this flag is only ever raised in hsh_insert_busyobj().
In a couple places, this flag was checked as a means of coordination between a client and a fetch tasks. This is the role of a boc to coordinate these tasks, so the boc grew the responsibility of (indirectly) clearing the OC_F_BUSY flag in the fetch task.
In that regard, there are now stronger guarantees around this flag:
- it only applies to potentially shareable objects (excluding req.body, synthetic and private objects)
- there are four general paths that clear the flag
- when the
berespis ready for streaming (ifberesp.do_stream) - when the
berespfetch completes (if! beresp.do_stream) - if the fetch fails before the step implied by
beresp.do_stream - if the fetch task is never created (the busy objcore is withdrawn)
- when the
- clearing the flag, and only that, can trigger a rush
- with the notable exception of
HSH_Insert(), not involving a fetch
- with the notable exception of
- the rush policy is derived from the objcore flags
- the new policy for shareable objects is to wake up all requests
The convergence of all paths towards consistently dropping the OC_F_BUSY flag as soon as there is a caching outcome or lack thereof allows a finer grained waiting list activity (much less spurious wake ups) with lower latency.
This should result in overall less locking on objheads.
A request rushed reembarks the lookup step with the objcore that just dropped its busy flag. This means that we can perform a cache hit before entering the large lookup critical section if the object is shareable and matches the request. Even if the objcore was cached stale, it can now be served to all waiting clients. This enables a response-wide no-cache behavior (by opposition to per-header) that is currently prevented by the built-in VCL, but now in the realm of the possible.
A little bit of trivia: the exact same test case covering the partial no-cache support exists in all iterations of this change.
Being able to serve stale (but valid) waiting list hits solves the serialization problem for the most part. It shifts the problem towards incompatible variants, which creates spurious wake ups that are compensated by all the ones that were eliminated. It makes the problem much worse when there are a lot of variants, for example with a Vary: User-Agent header (that should be a reason to make them uncacheable).
In that case the vary_notice parameter can help diagnose such scenarios. A new vary_limit parameter could also be added as a mitigation, to result in pass transactions above that threshold. This is outside of the scope of this patch series, since this problem already exists with very similar detrimental effects.
There was one waiting list quirk I was aware of that I completely forgot: the safety net for vcl_backend_error.
Since the default ttl+grace+keep is zero upon entering vcl_backend_error, we currently inject an artificial life time for the synthetic object to allow the waiting list to get a chance to see it. With this patch series, this is no longer needed, because the waiting list can now process a lack of life time for a shareable object (aka cache-control: no-cache).
It is debatable whether we want to keep the artificial life time: on one hand it buffers backend errors for a little while, on the other hand it prevents new fetch attempts for a little while for epiphenomenonal backend errors.
Either way, the waiting list is no longer a problem, see 5bfcd0f5857945aa7085aa68c3a934e980bda45d.
#4085 added a minor merge conflict, I will try to deal with it before bugwash.
After a first offline review with @bsdphk there are several things to change:
- keep an exponential rush even for cacheable objcores
- be less formal on [re]validation through waiting list hits
- maybe keep the
vcl_backend_errorsafety net
The third point is not a strict requirement, up for debate. If we keep the the safety net, we should prune its keep period (@AlveElde noticed that the safety net goes higher than the default value for shortlived, defeating its purpose in this scenario).
Patch series updated as per @bsdphk's review:
- exponential rush is back (see b0f36e1dfd + 683bf6eb30)
- less formal wording (see 683bf6eb30)
- safety net still present for @nigoroll's review
Only two commits changed, the aforementioned ones.
I'm looking at the waiting list coverage that is failing in CI, I was able to reproduce it locally with some load.
This might be a bug in varnishtest that I don't understand yet. When a timeout happens, the server spec ends before the final txresp for no apparent reason (in the logs).
I made the following change:
--- bin/varnishtest/tests/r02422.vtc
+++ bin/varnishtest/tests/r02422.vtc
@@ -7,7 +7,7 @@ server s1 {
rxreq
# wait until the new version is ready
- delay 1
+ loop 10 {delay 0.1}
txresp -hdr "Etag: 6"
} -start
And the logs for s1 look like this:
**** s1 rxhdr|GET / HTTP/1.1\r
**** s1 rxhdr|Host: 127.0.0.1\r
**** s1 rxhdr|User-Agent: c4\r
**** s1 rxhdr|X-Forwarded-For: 127.0.0.1\r
**** s1 rxhdr|Via: 1.1 v1 (Varnish/trunk)\r
**** s1 rxhdr|Accept-Encoding: gzip\r
**** s1 rxhdr|X-Varnish: 1014\r
**** s1 rxhdr|\r
**** s1 rxhdrlen = 148
**** s1 http[ 0] |GET
**** s1 http[ 1] |/
**** s1 http[ 2] |HTTP/1.1
**** s1 http[ 3] |Host: 127.0.0.1
**** s1 http[ 4] |User-Agent: c4
**** s1 http[ 5] |X-Forwarded-For: 127.0.0.1
**** s1 http[ 6] |Via: 1.1 v1 (Varnish/trunk)
**** s1 http[ 7] |Accept-Encoding: gzip
**** s1 http[ 8] |X-Varnish: 1014
**** s1 bodylen = 0
** s1 === loop 10 {delay 0.1}
**** s1 Loop #0
** s1 === delay 0.1
*** s1 delaying 0.1 second(s)
**** dT 1.231
**** s1 Loop #1
** s1 === delay 0.1
*** s1 delaying 0.1 second(s)
**** dT 1.331
**** s1 Loop #2
**** s1 Loop #3
**** s1 Loop #4
**** s1 Loop #5
**** s1 Loop #6
**** s1 Loop #7
**** s1 Loop #8
**** s1 Loop #9
*** s1 shutting fd 5
** s1 Ending
So c4 is left hanging without a response and eventually times out.
I found the culprit, the test became racy and varnishtest was misleading (and I was also not paying enough attention).
I added 5 patches at the beginning of the series to help future troubleshooting and close the race before making it too easy to trigger.
Never mind, I only made the problem harder to reproduce on my end with better synchronization, there is wrong linked to bringing back the exponential rush for hits.
The problem was that the exponential rush of hits was introduced without safeguards, allowing requests to rush each other in turn when a waiting list hit was followed by a VCL restart.
In r2422 the clients c5 and c6 would restart upon a regular hit and they are both supposed to enter c4's waiting list. However, if one of them "lost" the race against c4, it would fail a third fetch and rush the other one that in turn would fail a fetch and rush the other than meanwhile entered the waiting list. Until one of them would exceed max_restarts.
The fix introduces a fence between clients entering a waiting list and objcores rushing the waiting list. The objcores can only rush clients that entered before their initial rush.
Rebased onto master to resolve minor conflict with #4109 (touching adjacent lines in cache_varnishd.h).
Force push:
- rebase against current trunk
- squashme commits squashed where they belong
- two more commits from the enterprise side of the fence
bugwash:
- rebase onto current master to solve conflicts
- align patch series with progress made on the enterprise side
- organize review with @nigoroll after May 15th
For context, dridi's slides on the topic are here: https://github.com/varnishcache/varnish-cache/wiki/VDD23Q1#compliance
I'm done porting the enterprise improvements to this branch.
During the review call with Dridi, I cherry-picked the first 5 harmless (flw) commits.
I have marked all resolved conversations as such and at this point I have only the unresolved remarks left open. In general, I am OK with this PR
Force-pushed to rebase onto current master to solve the conflict created by the commits from this branch already cherry-picked in master, and to address @nigoroll's remaining review items.
The test failures in CI for almalinux appear to be unrelated to this change.
edit: should be fixed in trunk with 94153629361e2cd8622a360cd2ddc097a1ebab23.
Rebased after ae83488d909cab23f201b706a530d79a2fbc4fd9.
Rebased onto current master (8cbf914a106e19bbec12f0fcd5c78348b2f52828) as requested in https://github.com/varnishcache/varnish-cache/pull/4365#issuecomment-3062459644.
To get ahead, I pushed a rebase to https://github.com/nigoroll/varnish-cache/tree/partial_nocache
How is that different from what I did?
Sorry, I guess I had seen an outdated head of your branch. I am currently traveling and it might be that bad connectivity made me draw wrong conclusions.
I thought I had done something wrong and I couldn't see the difference...
I thought I had done something wrong and I couldn't see the difference...
Yes, it was my fault, sorry again.
After @hermunn's review I had another one from @mbgrydeland and I pushed 8 commits that came out of our discussions, half of which were directly requested by Martin.
The actual review happened on the varnish enterprise side and we are moving forward with this change. It has seen high traffic in production without causing any problem, and while this exposure to production traffic may not offer comprehensive coverage of this change, we believe that it results in all-around better behavior of waiting lists.
Please review the additional commits and if approved, I can clean up the patch series, and then rebase it onto current master to resolve the minor merge conflict with #4362.
Two test cases failed in the sanitizer job, both of them ran into EAGAIN in ASAN's pthread_create() wrapper, what I suggested to address in https://github.com/varnishcache/varnish-cache/pull/4352#issuecomment-3005349967.
bugwash: OK, @dridi please fight any fires before release
(edit, bugwash verdict changed)
I have force-pushed to squash the last batch of review commits, and to rebase onto current master.
I'm having a pass on the code while CI is running.