schnellru icon indicating copy to clipboard operation
schnellru copied to clipboard

It seems that schnellru does not count heap memory, such as String, Vec

Open chengshuyi opened this issue 11 months ago • 4 comments

If the value of LruMap is String, its memory_usage statistics are not very accurate, and it seems that the size of the heap memory allocated by String is not included.

    let mut lru: LruMap<u32, String, ByMemoryUsage> = LruMap::with_memory_budget(100* 1024 * 1024);

    for i in 0..1024 {
        let s: String = std::iter::repeat('1').take(1024).collect();
        lru.insert(i, s);
        println!("{}", lru.memory_usage());
    }

chengshuyi avatar Jan 22 '25 01:01 chengshuyi

This works as expected, as there's no standard way of getting the amount of heap memory used by an arbitrary T. If you want to also count the heap memory you need to implement your own Limiter. (Just copy-paste the ByMemoryUsage one and also include the heap memory in the calculations.)

koute avatar Jan 22 '25 05:01 koute

This works as expected, as there's no standard way of getting the amount of heap memory used by an arbitrary T. If you want to also count the heap memory you need to implement your own Limiter. (Just copy-paste the ByMemoryUsage one and also include the heap memory in the calculations.)

Is that right? The following is the processing of on_insert, and the others are similar.

 #[inline]
    fn on_insert(&mut self, _length: usize, key: Self::KeyToInsert<'_>, value: V) -> Option<(K, V)> {
        self.size += value.size();
        Some((key, value))
    }

chengshuyi avatar Jan 23 '25 02:01 chengshuyi

If I understand the proposed solution correctly, isn't on_insert called after on_grow, so the self.size + new_memory_usage that you would calculate in on_grow would then include all the sizes of the elements as calculated by core::allocator::Layout and all the elements "heap size" except the last, about to be inserted element? This means that we could end up using more memory than allowed by the last element's "heap size".

Wouldn't it be possible to instead have the Limiter trait have another method, size_of that the user of the library can override too? That way we don't have to manage that mutable state but just do something simple like element.len() for a Vec (we might not even care about byte size, which could be pretty cool, we could limit on other things like open file descriptors or whatever)

gardell avatar Apr 01 '25 07:04 gardell

Here's an example of how to implement the trait to also limit heap usage:

https://github.com/paritytech/polkadot-sdk/blob/c6249dca5928c12c35c577b971277d4211c928b7/substrate/primitives/trie/src/cache/shared_cache.rs#L63

isn't on_insert called after on_grow

No. on_insert is called before on_grow. If on_insert returns false then no attempt is made to allocate or grow the underlying map.

Wouldn't it be possible to instead have the Limiter trait have another method, size_of that the user of the library can override too? That way we don't have to manage that mutable state

No. The trait deliberately is very low level to allow for as much flexibility as possible, and it doesn't actually explicitly keep track of any memory usage (we don't do a size += size_of::<T>() anywhere internally), so adding something like that would just make it strictly less powerful and have more overhead. The new_memory_usage you see in on_grow is dynamically calculated based on how much memory the underlying hash map will use after it's resized.

If you don't want to manage any mutable state you can always make yourself a wrapper struct + a helper trait that will manage it for you.

koute avatar Apr 01 '25 07:04 koute