nearcore
nearcore copied to clipboard
feat: limit trie cache by memory consumption
Instead of checking the number of values and their sizes, the caches are now limited by the actual (approximated) memory consumption.
This changes what total_size
in TrieCacheInner
means, which is also
observable through Prometheus metrics.
Existing configuration works with slightly altered effects. Number of entries convert to an implicit size limit. Since the explicit default size limit currently is 3GB and the default max entries is set 50k, the implicit limit = 50k * 1000B = 50MB is stronger. This still limits the number of largest entries to 50k but allows the cache to be filled with more entries when the values are smaller.
For shard 3, however, where the number of entries is set to 45M in code, the memory limit of 3GB is active. Since we change how this limit is calculated we will see fewer entries cached with this change. Shard 3 should still be okay since we have a prefetcher in place now that works even when the cache is empty.
If we merge this change first, I can finalise #7578 without adding temporary code only to deal with capacity configuration that we want to ditch anyway.
Thanks for the review @matklad! I think I addressed all your comments.
cc also @Longarithm, to be very clear here, this change is going to have an effect on existing shard caches.
- The old config of 50k per slot is now converted to 50k * 1kB = 50MB. If it means we fit in more than 50k (most likely we will) then there will be more entries in the shard cache now.
- The old config for shard 3, 3GB / 45M slots, is still limited by 3GB, but now factoring in the 100B overhead. So it will effectively contain less in the cache.
I believe this should all be okay. But I am not going to merge this without your approval on these two specific changes @Longarithm :)
Note to self: Add a comment in CHANGELOG after rebasing
I'll leave a review on Monday/Tuesday UPD: let it be Wednesday because today I took another deep look on https://github.com/near/nearcore/pull/7445, and there is no rush
LGTM. Is it right that after this cache for shard 3 will hold less entries, but we don't care much because we have prefetching as well?
yes, that's exactly right