cache
cache copied to clipboard
Cache stampede prevention for empty cache
The current implementation only works when the data is already cached. In this case, every time the data is read from the cache, the algorithm calculates the expiration time and, if key is expired, recompute the cache. But imagine the cache is empty(for the first time) in high traffic. If we get a thousand simultaneous requests, they all hit the empty cache and the system calculates the data for all of them. Which again causes the cache stampede problem. Is there a way to prevent the calculation of it for all requests in the first time, when our key and data are not cached? For example lock the key or anything else ?
$now = microtime(true);
$delta = ceil(1000 * ($now - $this->updated)) / 1000;
$expired = $now - $delta * $beta * log(random_int(1, PHP_INT_MAX) / PHP_INT_MAX);
return $this->expiry <= $expired;
One of the obvious solutions — warmup. How else would you solve it? Locking isn't a great solution for something like cache.
The situation you have isn't "Cache stampede" but "Thundering herd".
Cache stampede, also known as a thundering herd :
When a large number of concurrent requests attempt to access the same data from the cache simultaneously, but find it missing or expired.
https://www.thetechplatform.com/amp/how-to-prevent-cache-stampede-thundering-herd-problems
I think these two are the same. What difference does it make if the key has expired or if it has never been cached before? Both cause the same problem.
Wiki :
However, under very heavy load, when the cached version of that page expires, there may be sufficient concurrency in the server farm that multiple threads of execution will all attempt to render the content of that page simultaneously. Systematically, none of the concurrent servers know that the others are doing the same rendering at the same time.
Yeah. Seems to be different names used for the same thing. I found that "thundering herd" if usually used for empty cache situation and stampede is used for when many cache items are expiring at the same time.
Anyway, common solutions are:
- Single-threaded cache warmup before making an instance available to end users. That requires some coding and overall complicates the system.
- Cache request coalescing. It doesn't work with PHP (at least in-memory) because PHP "dies" after each response is made. If done with extra shared storage, that is, efficiently, "locking".
- Locking is quite inefficient compared to other solutions.
Are there more?
Why locking is not good idea ?
- Higher latency.
- Reduced concurrency.
- Potential for deadlocks.
I see no other immediate solutions to empty-cache problem i.e. around https://github.com/yiisoft/cache/blob/master/src/Cache.php#L79 we can do the following:
public function getOrSet(
mixed $key,
callable $callable,
DateInterval|int|null $ttl = null,
Dependency $dependency = null,
float $beta = 1.0
) {
$key = $this->keyNormalizer->normalize($key);
/** @var mixed */
$value = $this->getValue($key, $beta);
if ($value) {
return $value;
}
// Wait till it's released.
// We'll get additional N cache reads. Better than N reads from DB but...
while ($lease = $this->psr->getRaw('lease_' . $key)) {
usleep(500); // The higher the value the worse latency is.
}
// How to choose correct expiration time here?
// That will double the amount of keys and reads/writes for cache overall.
$this->psr->set('lease_' . $key, 1, max($ttl, 1500));
try {
$value = $this->setAndGet($key, $callable, $ttl, $dependency);
} finally {
// Might not be reached at all if above causes fatal error.
// Then we have to wait for 1500 for the lease to be released by TTL.
$this->psr->delete('lease_' . $key);
}
return $value;
}
But drawbacks are pretty significant.
It's should be optional and disabled for cases where user own resolve this problem on application level.