sparse icon indicating copy to clipboard operation
sparse copied to clipboard

What does `compressed_axes = None` mean?

Open hameerabbasi opened this issue 2 years ago • 4 comments

I was going through the GCXS code trying to fix bugs when I came across a few places where compressed_axes was being set to None. That, in my opinion, should never happen. It should always be a tuple of integers.

@daletovar Could you specify the context here, and what could be needed to remove this?

hameerabbasi avatar Jan 06 '22 09:01 hameerabbasi

There are a few cases where compressed_axes is set to None. In the case of 0- or 1-dimensional GCXS array, there are no axes to compress so we set compressed_axes to None. In the constructor and many of the array methods like resize and reshape, compressed_axes defaults to None. This is a way for the user to indicate that they don't have a preference. This does not mean that the resulting array will have arr.compressed_axes is None. Instead, we generally do something like the following:

if compressed_axes is None:
    compressed_axes = (np.argmin(shape),) # use compressed_axes that results in best compression

I think that is makes sense to have a default value for the cases that users don't have a preference. I used None for that value.

On the one hand I don't think it's really an issue because any array with ndim > 1 is going to have a compressed_axes that is a tuple of integers:

>>> a = sparse.random((5,6,7), format='gcxs', compressed_axes=None) 
>>> a.compressed_axes
(0,)

On the other hand I can see how such behavior could be confusing. In terms of removing it, I think that it should stay for 0- and 1-dimensional arrays. For higher dimensional arrays, we could pick a different default value - something like compressed_axes='opt' to indicate compressed_axes with the optimal compression.

daletovar avatar Jan 06 '22 21:01 daletovar

There are a few cases where compressed_axes is set to None. In the case of 0- or 1-dimensional GCXS array, there are no axes to compress so we set compressed_axes to None.

I think this may mislead people, and make it hard to get at the underlying format. Would compressed_axes in this case be () or (0,)? I think you mean there's very little choice in how things are stored, but that doesn't necessarily mean that there isn't a valid representation in terms of tuples of integers.

In the constructor and many of the array methods like resize and reshape, compressed_axes defaults to None. This is a way for the user to indicate that they don't have a preference. This does not mean that the resulting array will have arr.compressed_axes is None.

This seems reasonable to me, as long as there is a concrete compressed_axes specified in the underlying array.

I think that it should stay for 0- and 1-dimensional arrays.

As said above we could use either () or (0,) as appropriate.

For higher dimensional arrays, we could pick a different default value - something like compressed_axes='opt' to indicate compressed_axes with the optimal compression.

I think None is a perfectly reasonable value to indicate "I don't have a preference here."

hameerabbasi avatar Jan 07 '22 08:01 hameerabbasi

Okay, that makes sense to me. I like () for 0- and 1-dimensional arrays.

daletovar avatar Jan 07 '22 17:01 daletovar

Just bumped into this, there are cases e.g. in to_scipy_sparse that assume non-None compressed axes, but a GCXS array can still be constructed with compressed_axes=None

mangecoeur avatar Nov 20 '22 22:11 mangecoeur