SparseArrays.jl
SparseArrays.jl copied to clipboard
SparseArrays: export nonzeroinds for sparse vectors and sparse column views
As far as I know, we currently export access to nonzero values to sparse vectors and sparse column views, via SparseArrays.nonzeros. However, the corresponding SparseArrays.nonzeroinds is not exported, which makes these two methods of SparseArrays.nonzeros quite useless.
I think we should just write docs and export it: Users of sparse vectors / sparse column views are likely to rely on SparseArrays.nonzeroinds already, and would already be affected by any reorganization.
It might be worth simultaneously executing on the exported-API nomenclature shift from nonzeros to stored that various parties have been discussing for some time; i.e., rather than a new nonzeroinds export, export storedinds and simultaneously introduce and export storedvals. Best!
If this is done I would advocate it should be in a slightly more descriptive manner. Such that there is also something for SparseMatrixCSC and potentially per JuliaLang/julia#31329 SparseMatrixCSR.
An interface that got directly at the column pointers would be great as well.
@Sacha0 for matrices I think that we should also make sure that the methods are representative of the stored structures. So rowvals I think makes sense but nzrange is a little ambiguous.
For vectors what about indvals since that is more in line with the matrix notation.
@Sacha0 So you are saying new functions storedinds and storedvals that are only defined for sparse vectors, sparse column views (and potentially transposes of sparse vectors / sparse column views)? And nonzeroinds gets retained for the time, until eventually nonzeros for sparse vectors and nonzeroinds get deprecated?
IMHO for SparseMatrixCSC we should stick to the exported interface nonzeros - rowvals - nzrange and adapt the interface for SparseVector accordingly.
That means: replace SparseArrays.nonzeroinds by rowvals and formally introduce nzrange(V, col) = 1:length(V) where V<:SparseVector.
That would allow to handle Union{SparseVector,SparseMatrixCSC} uniformly.
The unexported storedinds, storedvals, ... which were introduced to achieve this uniformity, would not be needed anymore.
It seems to me that Julia has been trying to move towards semantic correctness. While it is semantically correct to ask for the rowvals(vec::SparseColumnView) , it is not correct to ask for the rowvals(vec::SparseVector) as it stands today where a vector is the analog of a "row matrix" or "row vector".
If you want an orientation agnostic method lets try to name it so.
I can't find the relevant discussions immediately --- if someone has the bandwidth to do the necessary issue tracker spelunking, that would be great! --- so in brief I'll reiterate the motivations for moving towards storedinds / storedvals:
-
Some time ago we moved away from eagerly stripping numerical zeros from the structure of
SparseMatrixCSCs, and in conjunction began moving away from thenonzerosterminology / towards the now-more-correctstored. -
The
SparseVectorandSparseMatrixCSCdata structures are sufficiently similar that it's possible to build an abstraction layer (abstracting away their field name differences, and otherwise hiding e.g. the absence of column pointers inSparseVector) that enables unification of implementation of a nice set of operations applicable to bothSparseVectors andSparseMatrixCSCs. Unifying field names between these data structures insofar as possible facilitates and simplifies realization of that abstraction layer. -
Supporting
SparseMatrixCSRat some point would be great. Minimizing the amount of implementation duplication betweenSparseVector,SparseMatrixCSC, andSparseMatrixCSRin so doing would be great. One way to facilitate that implementation deduplication is to extend the preceding abstraction layer to include the hypotheticalSparseMatrixCSR, and unifying the field names of those three types insofar as possible would facilitate and simplify the realization of that abstraction layer.(More concretely, the data structure underlying CSC is fundamentally the same as that for CSR. So one can construct a type for that data structure, e.g.
SparseMatrixCSX, with a parameter indicating whether theXisCorR, and makeSparseMatrixCSCandSparseMatrixCSRcorresponding type aliases. One could generalize the thing still further and implementSparseVectoron top of it too.) -
Going a step further, it would be great to extend that abstraction layer to
Adjoint,Transpose, and other wrapped types of sparse matrices/vectors, such that... you see where this is going :) ... generic implementations can be written for operations common to all of those types. And so it would be great if the nomenclature could make sense for all these things.
Hence storeinds and storedvals, and potentially pointers or something similar. IIRC, we didn't have enough deprecation cycles and developer bandwidth to make this happen prior to 1.0, but the plan was to use dot-overloading to introduce aliases for the present field names, deprecate the present field names, and then in a future release cycle swap the aliases for the field names. Best!
@Sacha0 this vision of a general, concise, and consistent sparse object has my full support. I proposed the (as I know now) legacy interface nonzeros - rowvals - nzrange because that are currently (v1.2) the only exported functions.
I also find it necessary to make a difference between the field names (which should not be used by the user at all), and the accessor methods which could be named storedvals - storedinds - indrange or similar. These may be consistently extended to SparseMatrixCSX, SparseVector, and more or less straight forward to SubArray{,Sparse...}. The last one is subject of a PR I am currently busy with.
I use indrange instead of pointer_range to keep the analogy with nzrange.
There should typically be no need to access the fields directly. All sparse or wrapped sparse types supporting this interface could make use of the standard column-wise (for CSR row-wise) processing loop: (col, row, and the constant 2 should be replaced by generic names for CSR)
for col in axes(A, 2), j in indrange(A, col)
row = storedinds(A, k)
val = storedvals(A, k)
... do the job for (col, row, val) ...
end
@Sacha0 I couldn't agree with this more:
(More concretely, the data structure underlying CSC is fundamentally the same as that for CSR. So one can construct a type for that data structure, e.g. SparseMatrixCSX, with a parameter indicating whether the X is C or R, and make SparseMatrixCSC and SparseMatrixCSR corresponding type aliases. One could generalize the thing still further and implement SparseVector on top of it too.)
To take it further, beyond using the same data structure for matrix and vectors (you are only saving 3 Int), it strikes me that without much difficulty you could also use the same data structure for uncompressed sparse where you leave room in a given"column" to insert more rows.