Consider dropping support for multi-dimensional bins
Our C++ implementation, based on MultiIndex and variable::transform supports bins with a multi-dimensional content buffer. This is rarely (if ever) used in practice by users. The current main uses are some implementation details, such as boolean indexing and indexing by integer-lists.
In #3044 we are investigating performance problems that ultimately tie into the per-bin overhead of MultiIndex. This does not just affect integer-list indexing, but any "event data" operations with very few events per bin.
A secondary problem is the extremely high complexity of MultiIndex, which potentially also affects compile times and binary sizes.
If we would consider dropping support for this (provided that the cases where this is used internally can be addressed in another manner), MultiIndex might be simplified a lot. Here is a potential solution:
- Assume we limit ourselves to "1-D" content buffers for binned data, with stride 1.
variable::transformhas special branches (for the purpose of optimization) for predefined stride combinations.- When iterating binned and dense data, this corresponds to stride-1 iteration within a bin and stride-0 for the dense operands. Therefore,
MultiIndexwould not need to handle this, except for loading the bin-start and bin-size. The rest could be handled by the stride-optimization branches invariable::transform.
Would simplifying MultiIndex mean that we could now implement negative strides more easily?
Would simplifying
MultiIndexmean that we could now implement negative strides more easily?
Yes, almost certainly.
Note also #2634, which shows an existing shortcoming, i.e., multi-dimension-bin support is mediocre anyway.