squid
squid copied to clipboard
WIP: Fix foreground cache_dir indexing
Foreground cache rebuild (squid -F
) was unusable in many environments:
-
Non-SMP: Squid started listening for incoming connections and then ignored those connections while building the index, clients timed out.
-
SMP: Workers started serving incoming connections, but diskers ignored I/O requests from workers, and workers reported disk I/O timeouts, significantly slowing down transactions. With enough traffic, disker queues overflowed as well.
Also, during foreground indexing, non-SMP Squid process or SMP diskers were blocked and, hence, did not react immediately to signal-based commands such as shutdown and reconfiguration. These commands took effect only after the cache index was built.
Now Squid indexes all cache_dirs first (while responding to signals) and starts listening for incoming connections second.
A new RegisteredRunner::useFullyIndexedStore event now activates subscribers waiting for a fully indexed Store. Besides the obvious use case of opening listening ports, we found and fixed two other cases:
-
StoreDigest:
-F
now delays the start of Store digest computation. Making an essentially empty digest and then replacing it with a very different digest of an indexed cache is likely to harm peers. Perhaps-F
should wait until the end of the digest computation as well, but that idea deserves a dedicated consideration/PR. -
PeerPoolMgr:
-F
now delays peer polling. Opening and maintaining connections, especially TLS connections, may require faster reaction times than a foreground indexing Squid can provide. Also, seeing our connections, polled peers may incorrectly assume that our instance is accepting connections and may send cache manager requests (or worse).
In SMP mode, cache_dirs are indexed by diskers. Detecting
useFullyIndexedStore condition in kids requires a new IPC message. In
-F
mode, workers do not receive access to cache_dirs until those are
indexed. To prevent cache_dir discovery timeouts in workers, we also
added a new IPC message to extend the wait.
The SMP part of the implementation relies on new IPC messages:
-
mtRebuildFinished: Each disker now sends a new IPC message notifying everybody (via Coordinator) that it has finished indexing. This allows SMP workers to detect useFullyIndexedStore conditions.
-
mtForegroundRebuild: With
-F
, Coordinator does not announce a cache_dir X to workers until foreground indexing of X is complete. To prevent cache_dir search timeouts in workers, indexing diskers sent periodic "I am still working" messages (via Coordinator). Workers keep waiting for mtStrandReady, rescheduling their search timeouts.
I am posting this unpolished Draft PR to ask the following two related questions:
-
To fix foreground indexing, we have to add a few IPC messages. We have discovered that the existing HereIamMessage and new IPC messages have the same underlying structure -- they only differ in the message type field. The current PR code replaces all those message types with a single StrandMessage type. We could reduce this PR diff by introducing StrandMessage in a dedicated PR. It will look a bit strange because that refactoring is not really needed without this PR. Should we extract those changes into a dedicated PR?
-
Testing discovered that IPC messages destined to kid A1 can be successfully delivered to kid A2 when kid A1 crashes and is restarted as A2. When this happens, Squid may crash and/or misbehave, often producing confusing logging/artifacts. The current PR does not address and ignores this problem. Should we fix it in this PR (where the problem was discovered and arguably made worse by introducing more IPC messages) or do another PR dedicated to this fix?
IMO, ideally, both changes should be extracted, but I am looking for @squid-cache/commiters opinions. I do not want to spend cycles on creating three PRs if others are happy with one.
I'm in favour of splitting. It would be somewhat easier to review in smaller PR chunks. Your call though.
Thank you for a prompt and clear answer. We will probably split in two or three parts. Until then, please ignore this draft PR.
Status update:
split in two or three parts
I separated the first part as PR756.
PR #756 has been merged.
The second prerequisite pull request -- PR #771 -- has been merged as well.
We are now polishing this draft PR based on the merged changes.