ethereumjs-monorepo icon indicating copy to clipboard operation
ethereumjs-monorepo copied to clipboard

Discussion on improving concurrency of trie

Open s1na opened this issue 6 years ago • 3 comments

Modifications to the trie (put and del) are protected via a semaphore. The locking period however can be quite long, as after taking the semaphore put calls findPath and possibly _updateNode, both of which are IO-bound. This reduces throughput of updates significantly.

This issue aims to discuss two points:

  • What are the thread-safety requirements of the trie? i.e. What are the race conditions that have to be protected against?
  • Given the requirements, would it be possible to limit the use of semaphores to the absolute minimum or remove them completely to improve throughput?

Edit: Initial benchmarking results suggest this to be a bottleneck.

s1na avatar Jan 10 '19 08:01 s1na

My own guess with respects to the first question is that during the time when one "thread" finds the node it wants to update/put, another "thread" modifies another node on the path, making the second update obsolete (as it will be done on an outdated snapshot). Is this close to the rationale? are there other points?

If this is the case, modifying the current checkpoint mechanism in the following way could potentially increase concurrency for batch operations (single put/deletes would still need the lock) (#58 is also relevant here):

  1. Queue modifications
  2. Consumer calls commit/flush
  3. Take semaphore (and defer release)
  4. Get a read snapshot
  5. Apply all modifications in-memory on top of the snapshot
  6. Persist to db via an atomic db.batch

s1na avatar Jan 10 '19 09:01 s1na

// @vpulim @alextsg Would be great if you find some time to jump in here! 😄

holgerd77 avatar Jan 10 '19 09:01 holgerd77

Circling in @ScottyPoi here in case this might give some inspiration around possible smaller refactorings of the trie library.

holgerd77 avatar Aug 01 '23 11:08 holgerd77