stellard
stellard copied to clipboard
Temporal buckets for ledger objects
Ledgers (both state and history) are currently stored as fine-grained SHAMaps, one for transactions and one for the state of all accounts and other ledger-objects (orders etc.).
We would like to move away from using SHAMaps for both aspects of storing a ledger. This bug is about the account-state SHAMap. For the ledger-header and transaction-history representation, see bug #79
As a state-storage device for a set of accounts and offers, SHAMaps are more reasonable but still suffer from a lot of problems: the trie's internal nodes are subject to a high degree of churn (particularly bad when leaves cluster, as in orders, but significant even when leaves are uniformly spread through the tree), leading to a lot storage overhead and rehashing of internal state. Set reconciliation between states is very fine grained and carries a lot of overhead.
We would like to store (account-state / order) ledger objects in a small set of (say, 2-5) coarse-grained, temporally stratified buckets instead, similar to an LSM-Tree. Each bucket would contain a sorted set of objects recently modified, with each bucket overflowing and being merge-sorted into the next-level bucket, in batches, as it fills with older objects. Each bucket would be re-hashed (as a sequence) on each overflow-merge operation, and the complete account-state hash would be the composition of the bucket hashes, with the smallest / most-recently-changed bucket requiring rehashing most frequently and the larger buckets being rehashed much less frequently.
We believe this format would have significantly lower storage and transmission overheads. A certain number of nodes would exist in more than one bucket (copies in newer buckets shadowing those in older buckets) but otherwise each bucket could be stored either as a contiguous array or at very least a smaller set of B-tree like chunks of contiguous arrays.