BitFaster.Caching
BitFaster.Caching copied to clipboard
ConcurrentLfu time-based expiry
Support all time-based expiry modes for ConcurrentLfu.
- Extend node types to support the different access order lists as described here
- AccessOrderNode: Existing scheme is AccessOrder, but buffer array ops will be slower if node type is not sealed (due to extra type checking), therefore derive new sealed type (this will be a generic arg to ConcurrentLfu later)
- AccessOrderExpiringNode: support expire after access, just needs a time stamp
- TimeOrderNode: support both expire after write and custom modes - requires time order list + time stamp
- Think of some better names for these
- Port TimerWheel from Caffeine
- Define node policy interface required to expire items for ExpireAfterWrite, ExpireAfterAccess and ExpireAfter
- Plug all the policy hook points into ConcurrentLfuCore
- Implement an ExpireAfter policy that uses
TimerWheelandIExpiryCalculator - Implement ConcurrentTLfu based on the ExpireAfter policy
coverage: 99.304% (+0.06%) from 99.242% when pulling 3e2451428d38d0ede4fb06098c68a3280c49a88a on users/alexpeck/lfuexpire into 434a7be8dd0a5a5ecd4c1594897d58e57a72bb4b on main.
fwiw, I implemented expireAfterAccess and expireAfterWrite prior to having the timing wheel. Otherwise I’d likely not have bothered and made them convenience builder methods that set the expireAfter policy. Instead I didn’t redo working code and left both approaches. Just fyi so you don’t think it was a performance need or something, I simply hadn’t known how to do the variable policy originally in a way that fit my design goals.
Thanks for the insight, it makes total sense. I'm figuring out how to fit this together - from reading your code in Caffeine you generate a Node class with the required backing fields for each policy to minimize storage overhead.
For expireAfterAccess, since each of window, probation and protected are already in access order it seems like it will be easy to walk the existing lists from the LRU position until a non-expired item is found (I think you had suggested this last year so I had it in my mind this part is much easier to implement). This requires a node type with 1 additional field for the expiry time (so the overhead is prev+next+time).
For expireAfterWrite and expireAfter another linked list is required for time order, so I planned to define a node type with an extra prev/next ptr (so the overhead is prev+next+time+timeprev+timenext) - last night I figured out these can both use the same node class but I hadn't made the mental leap that both could use the timing wheel which would indeed be simpler.
It seems like it's still worth implementing expireAfterAccess separately to piggyback on the existing access order lists and compact the node size.
A caveat for using the access order lists is that the read buffer is allowed to drop events. That could delay the eviction of the expired entry if the head is pinned to a non-expired item, as this user experienced. This shouldn't typically be a problem if the size is bounded or the workload varies, but if the user unintentionally pins the entry then there is unbounded growth. Of course the lossiness of the read buffer is crucial for concurrent read performance, whereas writes are not lossy for correctness so expireAfterWrite does not suffer from this problem.
In that user's scenario the solution was to switch to the timing wheel based approach. This corrected the problem because the wheels turn independently and, if it encounters a misplaced entry, it will reschedule the entry rather than halt the expiration early. That makes it a more robust of an algorithm for malicious workloads.
I think that the vast majority of expiration should use expireAfterWrite (time-to-live) as a freshness mechanism, so I tend to view most usages of expireAfterAccess (time-to-idle) as a likely mistake. It's certainly very useful when applied correctly, just not commonly needed. That might mean that the optimization to remove those extraneous fields might not be worth your effort. When it works correctly it is very satisfying how elegant and simple it feels, so I am certainly not disfavoring it as a reasonable algorithmic choice if that's what you prefer.
I hadn't anticipated that. I had thought the opposite might occur - the dropped reads would fail to update the access order+expiry time of a Node, and it would be erroneously expired due to having a stale time stamp. Most likely I'm thinking of doing something that won't actually work in practice, and I didn't figure it out yet.
Thanks for all these gems of wisdom. I'm still getting my head around the timer wheel algorithm and before I get too deep into that I need to do a few rounds of changes so I can add this cleanly.
I probably updated the timestamp on the request since the entry would be buffered for some unknown amount of time and figured that fuzziness of the replay order wouldn't skew it too badly.
The timer wheel is neat but confusing, and I'm not sure how much of that is inherent or my fault. I didn't have a code reference and, since I wanted to make it as fast as I could for fun, I avoid expensive division / modulus for power-of-two time spans which makes the code harder to follow. I later found other implementations and most seem to use a chain-of-responsibility by having wheel objects where items flow downward. Those seemed to be more generic / flexible, but scatter the algorithm across multiple files and perhaps do extra work. I guess at the cost of being brittle and confusing, mine is crazy fast and perfect for what I needed. I'll be interested to see if you'd approach it differently once you get comfortable with it all.
It's good to start out with the fast version and I expect I will learn a lot from it. Your frequency sketch code translated to C# really well and that was also a fun exercise.
I will try to get deeper into it this week, there are many questions in my mind about how to get this working, and I first need to retrofit the generic node type into ConcurrentLfu without making a mess. I need some focus time to analyze TimerWheel, it looks very interesting and I also want to read the paper you linked to in your source code for background.
Baseline
| Method | Runtime | Mean | Error | StdDev | Ratio | Allocated |
|---|---|---|---|---|---|---|
| ConcurrentDictionary | .NET 6.0 | 7.128 ns | 0.1347 ns | 0.1497 ns | 1.00 | - |
| ConcurrentLfuBackground | .NET 6.0 | 21.053 ns | 0.4485 ns | 0.9060 ns | 2.89 | - |
| ConcurrentLfuForeround | .NET 6.0 | 48.797 ns | 0.9918 ns | 1.0185 ns | 6.86 | - |
| ConcurrentLfuThreadPool | .NET 6.0 | 53.980 ns | 2.2947 ns | 6.7658 ns | 6.32 | - |
| ConcurrentLfuNull | .NET 6.0 | 20.419 ns | 0.4308 ns | 0.4789 ns | 2.87 | - |
| ConcurrentDictionary | .NET Framework 4.8 | 9.301 ns | 0.2144 ns | 0.2633 ns | 1.00 | - |
| ConcurrentLfuBackground | .NET Framework 4.8 | 35.731 ns | 0.7128 ns | 0.9014 ns | 3.84 | - |
| ConcurrentLfuForeround | .NET Framework 4.8 | 72.231 ns | 1.3833 ns | 1.4206 ns | 7.77 | - |
| ConcurrentLfuThreadPool | .NET Framework 4.8 | 33.419 ns | 0.6971 ns | 1.3093 ns | 3.60 | - |
| ConcurrentLfuNull | .NET Framework 4.8 | 24.960 ns | 0.5225 ns | 0.5132 ns | 2.68 | - |
With time-based expiry changes:
| Method | Runtime | Mean | Error | StdDev | Ratio | Allocated |
|---|---|---|---|---|---|---|
| ConcurrentDictionary | .NET 6.0 | 7.309 ns | 0.1417 ns | 0.1842 ns | 1.00 | - |
| ConcurrentLfuBackground | .NET 6.0 | 21.333 ns | 0.4566 ns | 0.9926 ns | 2.94 | - |
| ConcurrentLfuForeround | .NET 6.0 | 49.709 ns | 0.6833 ns | 0.6058 ns | 6.75 | - |
| ConcurrentLfuThreadPool | .NET 6.0 | 34.427 ns | 0.6164 ns | 0.8841 ns | 4.71 | - |
| ConcurrentLfuNull | .NET 6.0 | 20.406 ns | 0.3131 ns | 0.2929 ns | 2.77 | - |
| ConcurrentDictionary | .NET Framework 4.8 | 11.454 ns | 0.2267 ns | 0.2784 ns | 1.00 | - |
| ConcurrentLfuBackground | .NET Framework 4.8 | 37.182 ns | 0.7228 ns | 0.7733 ns | 3.25 | - |
| ConcurrentLfuForeround | .NET Framework 4.8 | 73.309 ns | 1.4388 ns | 1.5992 ns | 6.41 | - |
| ConcurrentLfuThreadPool | .NET Framework 4.8 | 51.505 ns | 3.6956 ns | 10.8964 ns | 4.09 | - |
| ConcurrentLfuNull | .NET Framework 4.8 | 26.443 ns | 0.5488 ns | 0.5636 ns | 2.30 | - |
Last piece missing is to defend against frequent updates saturating the write queue. E.g. if the item is updated within 1 second or whatever, don't enqueue/reschedule.