hibernate-orm icon indicating copy to clipboard operation
hibernate-orm copied to clipboard

HHH-15305 Update custom LIRS implementation based on BoundedConcurrentHashMap

Open franz1981 opened this issue 3 years ago • 1 comments

This is using a separate size atomic counter.

Potential improvements (need benchmarks to spot if makes sense to push this that far):

  • using separate queue add/remove counters using different access patterns: former can use atomic increment, while the latter doesn't need it (remove always happen under lock) - size can be computed using both
  • use parity bits (or other tricks eg swapping the counter) on the counters to mark state (evicting, etc etc) and act cooperatively (not covered by the original paper) eg not trying to enter in the lock to perform eviction, while already performed by others
  • using manual padding to protect the most accessed fields by false-sharing (possible both with a single counter/separate counters)

These changes are useful to both improve the scalability of the solution and making the algorithm more cooperative.

An additional thought could be on replacing the concurrent linked queue with something "different":

  • FIFO policy is not that important for the algorithm
  • evinction relies on a linked queue remove (O(n))
  • it's a multi-producer multi-consumer data structure: can be made multi producer, single consumer

franz1981 avatar May 31 '22 14:05 franz1981

One of the adverse effects of this PR is that, in order to reduce the contention on the newly introduced size field, it's performing a batch update after the entries got expired, meaning that an incoming hit will find size to still be beyond the expiration threshold and will try enter the lock: it will eventually fail to do it, because the current expiration action should have cleaned up the most.

This could be avoided in different ways, in case it matters:

  1. not performing any batching update of size during the expiration ie calling decrement after each successful poll/removal
  2. instead of using a lock, who first find the size to be beyond the threshold, can try a compare-and-set a parity bit in addition to it, "marking" the state within the size as "expiration started" and then lock - any concurrent hit won't even try to enter the lock/nor expire (while the incrementAndGet will still work fine :P) because aware that another thread is already taking care of it

For the latter approach, some care must be taken to ensure:

  • the thread performing expiration won't left some work undone or
  • a concurrent hit that see an (almost completed) expiration state, takes care to perform expiration if needed

There are few strategies to fix the potential issue, anyway: I've reported some just for the record

franz1981 avatar Jun 01 '22 10:06 franz1981

finally reviewing this properly, sorry for the delay :) Looks great, I'm preparing to merge it.

Sanne avatar Jan 12 '23 22:01 Sanne

(pushed it manually, so it looks closed here)

Sanne avatar Jan 12 '23 22:01 Sanne

I'm a bit late but stumbled onto this. This class was written for Infinispan in conjunction with CLHM & Guava's Cache, as we were discussing how to design a concurrent cache and settled on that source paper. You'll see a comment that the LIRS code is based on a proof-of-concept version in our repo that was abandoned (has a low hit rate as lots of missing pieces as confusing paper). Infinispan later rewrote their cache, but it suffered high contention and a memory leak, and we had some a lot of helpful discussions when they switched to Caffeine (benchmarks). That project combines a lot of our experience from the variants and is closer to CLHM which avoided some of these issues you found (e.g. maintaining a processing state machine).

If you end up diving into this again, you might review Caffeine to see about borrowing its improvements. For example, the CLQ is replaced by striped mpsc ring buffers which has a massive speedup (as lossy so it samples the access history). You could also remove all of the LIRS logic since it doesn't actually improve hit rates (a correct implementation is in the simulator). Also, if you ever get bored, optimization ideas are always welcome. 🙂

ben-manes avatar Mar 08 '23 18:03 ben-manes

Never late @ben-manes , especially with such good advices!

If you end up diving into this again, you might review Caffeine to see about borrowing its improvements

I will, already added in my to-do list, thanks!

Also, if you ever get bored, optimization ideas are always welcome.

I am more into the concurrent msg passing queues thing, but I would definitely enjoy it :)

franz1981 avatar Mar 08 '23 19:03 franz1981

Hi @ben-manes ! great to see you here as well :)

Right I most definitely would wish to get rid of such code from our repository and delegate to Caffeine; not only because I trust Caffeine from my experience with Quarkus and Infinispan, but also because we admittedly don't have the right expertise for high-performance cache implementations among maintainers of this project, or even potentially having to maintain a complex set of concurrency & performance benchmarks - it just makes sense to delegate the responsibility to established libraries.

But I'm considering that a next step, something to consider for Hibernate ORM 6.3+. I don't think we'd want to make Caffeine a mandatory dependency - that gets really tricky for end users needing to align versions, especially when it comes to popular libraries such as these two, so we should nail it un such a way that it's optional - this implies we'd still need a fallback implementation, at least for the time being. I'd be fine for that to get deprecated and less-so tested, while recomending Caffeine.

So my vague idea (totally needs a POC first) is to extend the 2nd level cache SPI we define: the 2nd level cache integration point is were we allow people to plug-in custom caches for the data; clearly any cache implementation that people would choose for the data is potentially also having the capability to provide a better implementation for the uses we currently have for the [BoundedConcurrentHashMap. So perhaps we can extend (again it would need to happen iteratively) this integration point so that whatever Cache is plugged in, it will also serve such needs.

cc/ @sebersole

Sanne avatar Mar 09 '23 14:03 Sanne

Oh, I didn't mean to imply replacing this with Caffeine. Certainly if your team was spending time evolving it for a critical need then I'd have suggestions of where you could borrow improvements. I think minimizing dependencies is very pragmatic for libraries, e.g. caffeine embeds a copy of a jctools queue (a project that @franz1981 contributes to). This code has worked fine and needed little maintenance, so as long as there are no demands or unnecessary proliferation it seems very reasonable to keep as is (@vblagoje did a great job here and it was fun to brainstorm with him on the algorithms).

If this does become a maintenance burden then it might be overkill as I see it only used in 2 places: QueryInterpretationCacheStandardImpl and StatsNamedContainer. These are both only ever set to a max size of 20 entries, which makes it very tiny. I don't know if those cases require high concurrency, but it is small enough that even a naive and simple cache might be enough (e.g. random replacement via a direct mapped AtomicReferenceArray). It doesn't seem like these cases deserve making something pluggable or promote adding a dependency. It also seems pretty safe to leave things as is, so without a pressing need then I think this is great.

(Similar situation for ConcurrentReferenceHashMap)

ben-manes avatar Mar 09 '23 18:03 ben-manes

Oh, I didn't mean to imply replacing this with Caffeine.

Sure, I know you didn't but that's what I'd like to allow people to do - implementation wise. Or allow them to plug something else.

So in Quarkus (for example) since we already have a Cache which is based on Caffeine, we'd reuse the same dependency for such purposes.

Totally agree about not wanting to have too many dependencies, but this is about allowing to make good use of the ones already there.

Specifically in Quarkus, the Hibernate ORM extension already has a hard dependency on the Caffeine extension, so would just be a matter of adapting the contracts and injecting an appropriately configured cache instance.

Sanne avatar Mar 10 '23 11:03 Sanne