go-ipld-prime icon indicating copy to clipboard operation
go-ipld-prime copied to clipboard

node/bindnode: support binding a plain Go map to an IPLD map

Open mvdan opened this issue 3 years ago • 1 comments

Right now, we implement ordered maps in the form of:

struct {
    Keys   []K
    Values map[K]V
}

This is because maps in IPLD must have a stable order:

Iteration order is defined to be stable: two separate MapIterator created to iterate the same Node will yield the same key-value pairs in the same order. The order itself may be defined by the Node implementation: some Nodes may retain insertion order, and some may return iterators which always yield data in sorted order, for example. 

What we implement now with the struct above is an ordered map that keeps insertion order. However, in many common use cases like graphsync, that's unnecessary; they just want a map to encode and decode as dag-cbor. In that case, a plain Go map that sorts its keys when MapIterator gets called should be enough.

Sorting the keys any time MapIterator gets called is a certain amount of overhead, both in copying the keys and in sorting them, but:

  • The current ordered map already duplicates the keys
  • We could alleviate the cost of copying the keys and sorting them by caching the result in the Node wrapper

cc @warpfork @rvagg @hannahhoward

mvdan avatar Jan 27 '22 10:01 mvdan

Hm. One non issue: allocation amortization. It's a noteworthy issue where this pattern appears in the gogen code generator and in the basicnode implementation, but I don't think it applies here.

(In detail: if K is going to be returned as an ipld.Node, having it in one big slab of memory -- the Keys slice -- means the whole thing can escape to heap one time, and that can save on a lot of allocs that may otherwise occur. If those values needed to be boxed into an ipld.Node interface, one each, and didn't all come from one contiguous memory like that, it would mean iteration costs an alloc per step, which is not good.)

But this doesn't apply to bindnode: bindnode always has to return a new object for wrapping the key value, regardless, because the K type doesn't implement ipld.Node itself.

In fact, I guess making one []bindnode._node and wiring the contents up to contain all the keys may end up doing this amortization nicely, whereas (I think) currently there is no amortization, so this should... end up being a net win, often? (The wrapper allocs are O(n) vs sorting being O(nlogn), but I think the constant factor on allocs is quite a bit higher than the constant factors for comparison and swapping in the sorting.)

warpfork avatar Jan 31 '22 11:01 warpfork