boulder icon indicating copy to clipboard operation
boulder copied to clipboard

Akamai Purger's throughput configuration should just be "count of peer copies"

Open jcjones opened this issue 1 year ago • 2 comments

The throughput mechanism for Akamai Purger has a complex validation function:

https://github.com/letsencrypt/boulder/blob/6ee675f2f0729bba29d92a712e7c611a75b9acc8/cmd/akamai-purger/main.go#L92-L121

This is cool for ensuring that whatever the SREs configure the thing with are sane. But we run it in multiple DCs, so in reality this function is immaterial: the real limits are lower, and it's up to the SRE to solve the system of equations and derive out a limit that will protect the purger from itself, which leads to the question: why is this code here?

It would actually be an QoL and quality-of-config improvement if we could instead tell the Akamai Purger "there are X copies of you running, so use 1/X of the allowable limits"

jcjones avatar May 15 '24 17:05 jcjones

Another option (this is a half-baked idea): use consul/SRV records to keep a dynamic count of how many instances are running at a given time, and adjust automatically. Then we can scale up or down at will without needing to adjust the config at all. Maybe we'd still want some config values that serve as fallbacks or maximums, but I haven't thought that part through yet

Preston12321 avatar May 15 '24 19:05 Preston12321

That would require being able to count the peer datacenters, and thus is a much bigger lift on both sides.

jcjones avatar May 15 '24 19:05 jcjones