sparse
sparse copied to clipboard
What does `compressed_axes = None` mean?
I was going through the GCXS
code trying to fix bugs when I came across a few places where compressed_axes
was being set to None
. That, in my opinion, should never happen. It should always be a tuple of integers.
@daletovar Could you specify the context here, and what could be needed to remove this?
There are a few cases where compressed_axes
is set to None
. In the case of 0- or 1-dimensional GCXS
array, there are no axes to compress so we set compressed_axes
to None
. In the constructor and many of the array methods like resize
and reshape
, compressed_axes
defaults to None
. This is a way for the user to indicate that they don't have a preference. This does not mean that the resulting array will have arr.compressed_axes is None
. Instead, we generally do something like the following:
if compressed_axes is None:
compressed_axes = (np.argmin(shape),) # use compressed_axes that results in best compression
I think that is makes sense to have a default value for the cases that users don't have a preference. I used None
for that value.
On the one hand I don't think it's really an issue because any array with ndim > 1
is going to have a compressed_axes
that is a tuple of integers:
>>> a = sparse.random((5,6,7), format='gcxs', compressed_axes=None)
>>> a.compressed_axes
(0,)
On the other hand I can see how such behavior could be confusing. In terms of removing it, I think that it should stay for 0- and 1-dimensional arrays. For higher dimensional arrays, we could pick a different default value - something like compressed_axes='opt'
to indicate compressed_axes
with the optimal compression.
There are a few cases where
compressed_axes
is set toNone
. In the case of 0- or 1-dimensionalGCXS
array, there are no axes to compress so we setcompressed_axes
toNone
.
I think this may mislead people, and make it hard to get at the underlying format. Would compressed_axes
in this case be ()
or (0,)
? I think you mean there's very little choice in how things are stored, but that doesn't necessarily mean that there isn't a valid representation in terms of tuples of integers.
In the constructor and many of the array methods like
resize
andreshape
,compressed_axes
defaults toNone
. This is a way for the user to indicate that they don't have a preference. This does not mean that the resulting array will havearr.compressed_axes is None
.
This seems reasonable to me, as long as there is a concrete compressed_axes
specified in the underlying array.
I think that it should stay for 0- and 1-dimensional arrays.
As said above we could use either ()
or (0,)
as appropriate.
For higher dimensional arrays, we could pick a different default value - something like
compressed_axes='opt'
to indicatecompressed_axes
with the optimal compression.
I think None
is a perfectly reasonable value to indicate "I don't have a preference here."
Okay, that makes sense to me. I like ()
for 0- and 1-dimensional arrays.
Just bumped into this, there are cases e.g. in to_scipy_sparse that assume non-None compressed axes, but a GCXS array can still be constructed with compressed_axes=None