nearcore icon indicating copy to clipboard operation
nearcore copied to clipboard

feat: improved shard cache

Open Longarithm opened this issue 3 years ago • 0 comments

Improve shard cache to use RAM more effectively.

Three changes are introduced:

  1. If we put new value to LRU cache and total size of existing values exceeds total_sizes_capacity, we evict values from it until that is no longer the case. So the actual total size should never exceed total_size_limit + TRIE_LIMIT_CACHED_VALUE_SIZE. We add this because value sizes generally vary from 1 B to 500 B and we want to count cache size precisely. The current value size limit is 1000 B, so for average size of 100 B we use shard cache 10x more effectively.

  2. When we save trie changes, we previously just applied insertions to the shard cache - which means that we added newly created nodes to it. Deletions were applied only during GC of the old block. Now we apply deletions and call pop for shard cache during saving trie changes of a new block as well. This helps to use shard cache space more effectively. Previously nodes from the old state could occupy a lot of space which led to eviction of nodes from the fresh state.

  3. If shard cache pop is called, item is not deleted but put to the deletions queue with deletions_queue_capacity first. If popped item doesn't fit in the queue, the last item is removed from the queue and LRU cache, and newly popped item is inserted to the queue. It is needed to delay removals when we have forks. In simple case, two blocks may share a parent P. When we process the first block, we call pop for some nodes from P, but when we process the second block, we may need to read some nodes from P as well. Now we delay removal by 100_000, which helps to keep all nodes from 3 completely full last blocks.

Next steps:

  • make new constants configurable, similarly to trie cache capacity;
  • add new metrics to prometheus similarly to https://github.com/near/nearcore/pull/7439.

We want to get the whole update merged by next Wednesday, and cherry-pick it to 1.28 and 1.29 releases. This is not a protocol change, so it doesn't require a separate release or protocol version.

Testing

  • Tests for BoundedQueue which keep the queue of trie deletions
  • Tests for TrieCache which logic is less trivial now

Longarithm avatar Aug 17 '22 16:08 Longarithm