specs icon indicating copy to clipboard operation
specs copied to clipboard

Request: Magic AMT

Open Stebalien opened this issue 5 years ago • 2 comments

I'd really love an AMT where:

  • The bitwidth wasn't fixed for all levels. E.g., start with a bitwidth of 3 and then increase by 1 bit (double the elements) per level. I'm not sure how to do this and avoid re-writing all nodes on resize.
  • The number of elements in the leaves was proportional to the size of the leaf, not the actual number of elements. This is far from easy, but it would be so nice.
  • The depth wasn't related to the maximum element. We'd need to include offsets to make this work, but that shouldn't be difficult.
  • Better support for sparse AMTs. HAMTs don't have this issue because they store KV pairs. AMTs can more efficiently pack values by not storing the keys. A solution here would be to map offsets to sequential ranges (also fixes the depth issue).

At the end of the day, I'm thinking something like (rough idea):

type AMT struct {
    ...
    node AMTNode
}

// This expands into an array indexed by "bitwidth" bits of the index.
type AMTNode struct {
    prefix     Int // I need _some_ way to say "this is a common prefix all elements share.
    bitmap   Bytes
    children [AMTElement]
}

// If an element is a shard, we lookup the key within that shard. Otherwise, we load the child node.
type AMTElement union {
    | &AMTNode link
    | AMTShard map
} representation kinded

// A shard is a sequence of values at some offset.
type AMTShard struct {
    offset Int
    values [Any]
}

The tricky part here is deciding how big these shards should be and deciding when they need to be popped out into their own nodes.

Edit: I think we're going to need an additional offset

Stebalien avatar Dec 03 '20 21:12 Stebalien

Use-cases:

  • Queue that starts at an arbitrary offset. Currently, this offset will force a minimum depth of log(offset). This is a problem for pretty much all of our epoch-based queues in filecoin. Unfortunately, we also need random insertion/deletion.
  • AMT of really small objects. We want to create an AMT that just stores deal IDs, but these are so small that the current branching factor of 8 leads to ridiculously small leaves. Our only solution at the moment is to arbitrarily make leaves larger.
  • AMTs that efficiently store ranges at random offsets. This is basically what our sectors AMT looks like as miners will allocate batches of sectors at different offsets. Unfortunately, this means we end up with really deep AMT.

Stebalien avatar Dec 03 '20 21:12 Stebalien

I keep on thinking about this as a way to iterate on the current AMT https://github.com/filecoin-project/go-amt-ipld/issues/17#issuecomment-668568876, I just haven't had space to actually try and implement it. Keeping current semantics in place but adding the offset to allow for vertical compaction might end up making the structure quite reasonable for most of these use-cases. Bringing the branching factor up closer to the HAMT which is at 32, would probably help too.

rvagg avatar Dec 07 '20 00:12 rvagg