fast-kmeans icon indicating copy to clipboard operation
fast-kmeans copied to clipboard

Moving to 2-d array representations

Open ghamerly opened this issue 6 years ago • 0 comments

The Dataset class uses a 1-dimensional array which is indexed like a 2-dimensional array almost everywhere. It would be good to switch to an explicit 2-dimensional indexing scheme.

This could be done by either converting the array data to be a 2-dimensional array (double pointer) and still allow direct access to the 2-d array, or by replacing every direct access to elements of Dataset::data[r * d + c] with a call to Dataset::at(r, c) where r and c are row and column, respectively (and thereby disallow direct access to the underlying data array).

If we do the latter, we abstract the actual representation even more, making it more possible to adapt the code. In fact, we could have a Dataset1d and Dataset2d class that both are subclasses of Dataset, where Dataset1d is just like what we have now and Dataset2d uses a 2-d representation. Then we could choose which one we want when we receive some data. (We could even go so far as to develop subclasses to support column-major vs. row-major, but that's probably overkill.)

The danger is of course that the more separation we put between the code using Dataset and the data itself, the more we may lose in the opportunity to optimize accesses.

This came up because:

  • it would be nice to allow using 2-d arrays
  • we are building Python bindings where 2-d arrays are natural

ghamerly avatar May 30 '18 20:05 ghamerly