nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

feat: limit trie cache by memory consumption

Open jakmeier opened this issue 2 years ago • 4 comments

Instead of checking the number of values and their sizes, the caches are now limited by the actual (approximated) memory consumption.

This changes what total_size in TrieCacheInner means, which is also observable through Prometheus metrics.

Existing configuration works with slightly altered effects. Number of entries convert to an implicit size limit. Since the explicit default size limit currently is 3GB and the default max entries is set 50k, the implicit limit = 50k * 1000B = 50MB is stronger. This still limits the number of largest entries to 50k but allows the cache to be filled with more entries when the values are smaller.

For shard 3, however, where the number of entries is set to 45M in code, the memory limit of 3GB is active. Since we change how this limit is calculated we will see fewer entries cached with this change. Shard 3 should still be okay since we have a prefetcher in place now that works even when the cache is empty.

jakmeier avatar Oct 03 '22 10:10 jakmeier

If we merge this change first, I can finalise #7578 without adding temporary code only to deal with capacity configuration that we want to ditch anyway.

jakmeier avatar Oct 03 '22 10:10 jakmeier

Thanks for the review @matklad! I think I addressed all your comments.

cc also @Longarithm, to be very clear here, this change is going to have an effect on existing shard caches.

  • The old config of 50k per slot is now converted to 50k * 1kB = 50MB. If it means we fit in more than 50k (most likely we will) then there will be more entries in the shard cache now.
  • The old config for shard 3, 3GB / 45M slots, is still limited by 3GB, but now factoring in the 100B overhead. So it will effectively contain less in the cache.

I believe this should all be okay. But I am not going to merge this without your approval on these two specific changes @Longarithm :)

jakmeier avatar Oct 04 '22 15:10 jakmeier

Note to self: Add a comment in CHANGELOG after rebasing

jakmeier avatar Oct 04 '22 15:10 jakmeier

I'll leave a review on Monday/Tuesday UPD: let it be Wednesday because today I took another deep look on https://github.com/near/nearcore/pull/7445, and there is no rush

Longarithm avatar Oct 14 '22 12:10 Longarithm

LGTM. Is it right that after this cache for shard 3 will hold less entries, but we don't care much because we have prefetching as well?

yes, that's exactly right

jakmeier avatar Oct 21 '22 06:10 jakmeier