go-ipld-prime icon indicating copy to clipboard operation
go-ipld-prime copied to clipboard

Get Keys of a map without realizing values

Open willscott opened this issue 4 years ago • 5 comments

Currently, the access pattern API provided for maps is to create a MapIterator which iterates through the items of the map. It would be useful to be able to iterate through the keys of the map, without having to allocate and potentially construct the corresponding value ipld.Node objects for each.

willscott avatar Nov 10 '20 21:11 willscott

Could the same apply to keys? For example, if I only wanted the values, would I save work by not realizing/building the key nodes?

If so, then I'd suggest to put both ipld.Node values behind functions, like:

type MapIterator interface {
    Next() (KeyValue, error)
    [...]
}

type KeyValue interface {
    Key() Node
    Value() Node
}

As long as we say that the returned KeyValue is only valid until the next iteration, I don't think this will force any extra allocations, becacuse we can simply reuse a single stack-allocated KeyValue implementation.

I'm mostly ignoring error handling above - I assume that the Key and Value methods would need to return errors, and perhaps Next could never return an error and simply panic if it's called once Done returns true.

I'm mostly picking up a couple of ideas from https://pkg.go.dev/github.com/dgraph-io/badger/v2#Iterator.

mvdan avatar Nov 16 '20 15:11 mvdan

A KeyIterator() and ValuesIterator() would be nice, it would match a common pattern used in collections across the programming landscape

rvagg avatar Nov 17 '20 01:11 rvagg

I've wondered if we'd be able to get away with not needing this, but it seems like the consensus is no, we need it :) So, okay.

Agree KeyIterator() and ValuesIterator() sound like one valid option.

Agree MapIterator{ Next() KV } with delayed and separated key and value extraction is another option. (Although syntactically somewhat nasty to use, perhaps; would take multiple lines of user code to fully destructure that, which is notably burdensome.)

I haven't thought on this for long enough to form a preference between those two either way yet.

The other concern to balance is probably binary and generated-source (or boilerplate, if a node is handwritten) size for the approach we take. Assuming we add these features to the central ipld.Node interface (or one of the other interfaces like ipld.MapIterator that's also directly connected to ipld.Node), we force the need to implement this on every Node implementation everywhere.

warpfork avatar Nov 17 '20 14:11 warpfork

To me, the advantage of keeping a single iterator instead of splitting into KeyIterator and ValueIterator is that at each iteration one can decide what nodes need to be obtained. For example, if I only need to obtain the value node for a small subset of the keys.

If one needs both keys and values, it's also unclear how one would use KeyIterator and ValueIterator at once.

mvdan avatar Nov 17 '20 14:11 mvdan

Well to me, a major advantage of special KeyIterator() and ValueIterator() is that they become interchangeable with the iterator obtained from a List type. Which should introduce some interesting consumption interface patterns. Having a weird "only realised if you touch it" model means that everything in here is special-case.

rvagg avatar Nov 18 '20 05:11 rvagg