fast-kmeans
fast-kmeans copied to clipboard
Moving to 2-d array representations
The Dataset class uses a 1-dimensional array which is indexed like a 2-dimensional array almost everywhere. It would be good to switch to an explicit 2-dimensional indexing scheme.
This could be done by either converting the array data
to be a 2-dimensional array (double pointer) and still allow direct access to the 2-d array, or by replacing every direct access to elements of Dataset::data[r * d + c]
with a call to Dataset::at(r, c)
where r
and c
are row and column, respectively (and thereby disallow direct access to the underlying data
array).
If we do the latter, we abstract the actual representation even more, making it more possible to adapt the code. In fact, we could have a Dataset1d
and Dataset2d
class that both are subclasses of Dataset
, where Dataset1d
is just like what we have now and Dataset2d
uses a 2-d representation. Then we could choose which one we want when we receive some data. (We could even go so far as to develop subclasses to support column-major vs. row-major, but that's probably overkill.)
The danger is of course that the more separation we put between the code using Dataset
and the data itself, the more we may lose in the opportunity to optimize accesses.
This came up because:
- it would be nice to allow using 2-d arrays
- we are building Python bindings where 2-d arrays are natural