smt icon indicating copy to clipboard operation
smt copied to clipboard

[Scalability Design Document] Design document and plan to test SMT scalability

Open Olshansk opened this issue 2 years ago • 2 comments

Objective

Put together a design document and a plan to determine the scalability, efficiency and required improvements we need to make to the SMT to make v1 as scalable as possible.

Goals

  • Establish the limits of our current SMT (read, write, storage)
  • Determine the limits for leaf sizes to accommodate things like websockets
  • Explore alternative key-value stores to badgerdb and compare insert/retrieval speeds
  • Investigate the potential benefits of extending the prefix trie from a binary tree to a k-ary tree
  • Understand the implications of proof size on data availability costs and estimate our costs

Deliverable

NOTE: The deliverables below are just a starting point

A design document / plan that'll address the following in different steps:

  • [x] Benchmarking data on the current SMT's insert/retrieval speeds and proof sizes. For example: - Can we insert 0.5/1/10 million leaves in 10/60/200 seconds? - What are the retrieval speeds depending on the tree size?
  • [ ] A report on the feasible leaf sizes for different protocols. - If we plan to support larger unhashed values (e.g. for websockets), is there a limit on the leaf size? - Do we need to split it into multiple smaller leaves as data is streamed?
  • [ ] Comparative analysis of different key-value stores alongside badgerdb, focusing on insert/retrieval speeds. - Could/should we add adapters for leveldb, gorocksdb, boltdb or others and rerun the benchmarks above?
  • [ ] Research and consideration if we should look into implementing k-ary support instead of just binary support in this tree?
  • [ ] Detailed breakdown of proof sizes and strategies to map this onto data availability costs. - How big are proofs for a tree of 0.5/1/10 million leaves? - How long should our sessions be based on expected volume per servicer per session? - What will the data availability cost on Celestia or other DA layers? d-js.github.io/mermaid/) diagrams

Creator: @Olshansk Co-Owners: @h5law

Olshansk avatar Sep 09 '23 00:09 Olshansk