nethermind icon indicating copy to clipboard operation
nethermind copied to clipboard

Swaps in ConcurrentLru for LruCache.

Open MicahZoltu opened this issue 4 years ago • 7 comments

The current LruCache uses locks for thread synchronization, which can be quite slow. This change makes it so we use a concurrent LRU instead.

I chose to just wrap the existing LruCache to minimize the changes in this PR. In a future PR, it would be reasonable to just use ConcurrentLru directly, though that would require adding it as a dependency to a lot of projects.

Note: I don't know Nethermind's view on third party libraries. The referenced library is open source and MIT so we could just copy it over rather than referencing it via NuGet, but that is marginally more work and there is a good chance this change doesn't get accepted so I chose to do less work initially.

Note: The LruCache name constructor parameter was never used, so I have removed it as part of this PR since I was touching all of the constructor calls anyway due to removal of the startCapacity.

MicahZoltu avatar Apr 14 '21 08:04 MicahZoltu

Note: I did this because I mistakenly believed there was a race condition that may be leading to data corruption. Eventually I realized that wasn't the case but I already had done most of the leg work of this PR so I figured I would follow through with it since it may provide a performance improvement still.

MicahZoltu avatar Apr 14 '21 08:04 MicahZoltu

MIT is fine. Can you provide some benchmark to Benchmark solution that show's and proves the performance difference?

LukaszRozmej avatar Apr 14 '21 08:04 LukaszRozmej

What exactly would you like to see in a performance test? One could contrive a performance test that makes this new solution look great, and one could contrive a performance test that makes this solution look like no-change or a step backward. 😄 Do you all have any existing performance tests that could be run that are somewhat representative of real-world scenarios Nethermind encounters?

It is worth noting, that in some situations a plain dictionary with a lock is faster than a ConcurrentDictionary due to some additional GC pressure. However, these scenarios tend to be fairly contrived and not representative of most real world applications. While this isn't quite the same situation, it illustrates how benchmarking can be misguiding if you aren't careful to benchmark a real-world situation.

As for off-the-shelf benchmarks, you can check out the benchmarks done by the project which include against a "ClassicLru" implementation, which is an optimized version of the LRU Nethermind currently uses: https://github.com/bitfaster/BitFaster.Caching#concurrentlru-throughput

MicahZoltu avatar Apr 14 '21 09:04 MicahZoltu

Depends on #2995

tkstanczak avatar Apr 21 '21 17:04 tkstanczak

I think this can actually be closed? IIRC, you indicated someone else had a PR that did this way better (which I'm totally happy with).

MicahZoltu avatar Apr 21 '21 19:04 MicahZoltu

@MicahZoltu we are reviving our benchmarks, when its done we can properly check this change. Lets wait for it.

LukaszRozmej avatar Apr 21 '21 21:04 LukaszRozmej

We need to add some benchmarks before accepting this

LukaszRozmej avatar Jun 25 '21 08:06 LukaszRozmej

Resolved by https://github.com/NethermindEth/nethermind/pull/6417 and https://github.com/NethermindEth/nethermind/pull/5251

benaadams avatar Jan 30 '24 12:01 benaadams