otter
otter copied to clipboard
Thundering herd avoiding LoaderFunc on misses
Clarification and motivation
My go-to cache library for Go so far has been bluele/gcache - probably much less performant than many other alternatives but it has a very handy helper function that handles coordinating loads to avoid thundering herd. https://pkg.go.dev/github.com/bluele/gcache#readme-loading-cache
Acceptance criteria
Builder could have a new WithLoaderFunc(func(K) V). Calling Get(key_not_loaded_yet) in multiple goroutines concurrently will call the optional LoaderFunc once and then return to all goroutines all at once.
Lol, looks like I woke up famous. 😅
@comunidadio yes, it's very likely otter will get that feature as well. Just now I'm following caffeine author's words "After reaching 50-60 million qps throughput you can already focus on improving hit ratio and adding different features". 50 million qps on caffeine benchmarks is already achieved, soon otter will get an update after which it will be able to show throughput of 150+ million qps (thanks Ben), which is already much more like caffeine's performance. After that I planned to finally comment my code, because I completely ignored it during the third rewrite of the project. And next will come the features: callbacks, constant ttl expiration policy and loader functions too.
I believe you meant a cache stampede, which is often confused with the thundering herd problem but that is slightly different. It's important to note that if you also use a remote cache then the problem exists there too, but across servers instead of virtual threads. That can be solved by using a soft-hard expiration through probabilistic early expiration or scaled TTLs.
For a local cache you might consider supporting both blocking and non-blocking singular and batch loads. The synchronous variant can also be strengthened to support linearizable computations, e.g. to compute a new value with the previous as a blocking write operation. That makes it very flexible to adapt the cache to the application's need and facilitate features like entry pinning. You can also perform implicit asynchronous loads, like automatically refreshing a popular entry before it expires to avoid the latency spike if evicted. The batch asynchronous load can also be made implicit by coalescing independent singular loads over a short duration. The backoff, timeout, retrial, and fallback of these loads can be delegated to a resilience library.
I think coalescing would make sense to add to this library, as it's needed to write read-through caching code. I'm in the market for a better cache library for Go, but the candidates I've looked at generally don't have read-through support.
You can cobble it together yourself (here using the Go singleflight library):
var sfg singleflight.Group
func Get(key string) (any, error) {
val, err, _ := sfg.Do(key, func() (any, error) {
val, ok := cache.Get(key)
if ok {
return val, nil
}
// ... fetch ...
cache.Set(key, val)
return val, nil
})
return val, err
}
However, since this code executes outside the cache, this requires a lock to grab the coalescing slot before you even know if the key is there. So you really want something like:
var sfg singleflight.Group
func Get(key string) (any, error) {
val, ok := cache.Get(key)
if ok {
return val
}
val, err, _ := sfg.Do(key, func() (any, error) {
val, ok := cache.Get(key)
if ok {
return val, nil
}
// ... fetch ...
cache.Set(key, val)
return val, nil
})
return val, err
}
…but this locks probably more than necessary. If the cache itself can own all the locking, you can probably make this more efficient.
Serving stale data on error is another feature that is lovely to have, and dovetails with the above. If the "fetch" function fails, the cache should still be able to serve the previous value for a while, but only up to a certain maximum duration.