d3-selection icon indicating copy to clipboard operation
d3-selection copied to clipboard

Make .data() binding work with iterable protocol

Open bsidhom opened this issue 11 months ago • 3 comments

The current data binding implementation relies on random array-style access, but only ever accesses data sequentially. Some data structures (such as linked lists) cannot easily expose random access. Moreover, since this specifically uses bracket indexing, it only accepts Arrays or first-class array-like objects. Making a custom data structure indexable this way requires either an expensive Proxy (indexes have to be round-tripped from integers to strings on every access) or else requires the custom class to add explicit integer properties for each contained item; this adds memory bloat linear in the size of the data structure itself and also requires unnecessary bookkeeping.

If random access were required, it might be reasonable to require some cheaply-implemented but customizable accessor method (see, for example Array.at()). However, since this is not the case, it makes more sense to instead only require that input data types be iterable. This opens up efficient implementations for structures such as singly-linked lists and other purely applicative structures. Backward compatibility can be maintained by adding a cheap wrapper to implement iteration in terms of sequential indexing.

bsidhom avatar Dec 18 '24 03:12 bsidhom

Note that this is just a sketch of the suggested functionality. I have not added the shim required to make this backward compatible with code that implements random bracketed access but is not iterable.

There's another question as to whether .data() should also require a .length method or whether it's sufficient to compute that dynamically during iteration. As far as I can tell, this is essentially used as a proxy to infer whether we're dealing with an array-like object. Ideally, this method would only check for the Symbol.iterator property and use that for iteration; sadly, that breaks compatibility.

bsidhom avatar Dec 18 '24 03:12 bsidhom

I'm curious, what limitation of Arrays are you facing?

Is there something wrong with converting the data to an Array first?

curran avatar Dec 18 '24 14:12 curran

The main issue is just the expense of repeatedly converting to arrays at each render step. I'm working on some visualizations either using fully persistent data structures (which heavily use linking and structural sharing) or of those same data structures. It turns out that visualizing the internals of the data structures themselves is fine because any given element should only have $O(1)$ links. However, it becomes quite expensive to convert the entire data structure to a temporary array every render/update. That work is then thrown out because the temporary array isn't even retained, but discarded once the underlying enter/update/exit arrays, etc. are created.

On the other hand, I'm starting to realize that using the d3 join verbs is not necessarily the best way to spell this given that I want to use incremental updates with structural updates. It might make sense to use d3 for display only and roll my own diff/incremental update operation that aligns more closely with my data structures.

In any case, switching data binding to use iteration rather than random access in the interface more closely aligns with what d3 is already doing under the hood, requires a weaker contract from callers, and makes it easier to adapt for custom data structures, with minimal computational and storage overhead.

bsidhom avatar Dec 28 '24 23:12 bsidhom