SparseArrays.jl
SparseArrays.jl copied to clipboard
Common abstractions for nonstructural indicies between Dense and Sparse Matrices
indices is our abstraction for getting the indices so one can iterate the index of a matrix.
It works great for dense matrices.
However, for sparse matrices, it is almost never what you want to do.
You instread only want to iterate the indicies of the nonstructural elements.
(because there are less of them)
That is done using rowvals, and nzrange
In a dense matrix one can say that all the elements are nonstructural. So in a circumstance when you only want to iterate through the nonstructural elements, you want to iterate through all the elements in a dense matrix. More generally with a matrix of unknown type, you want to iterate through all the elements.
I thus propose that we should have an abstraction for getting the nonstructural indicies, which falls back to getting all the indiices, to make it easier to write code that works efficiently on spare matricies, and also works (as fast as is possible) on other types.
Some thing like
const SparseMatrix = AbstractSparseArray{<:Any,<:Any,2}
colinds(A::AbstractMatrix) = indices(A,2)
colinds(A::SparseMatrix) = rowvals(A)
rowinds(A::AbstractMatrix, col::Integer) = indices(A,1)
rowinds(A::SparseMatrix, col::Integer) = nzrange(A, col)
(bikeshed on names pending)
There is also a similar relationship between nonzeros (sparse) and vec (dense/fallback).
I was discussing this on slack with @mbauman and @StefanKarpinski the other day, and wanted to put it on GitHub before it was lost to the ages
Somewhat related is https://github.com/JuliaLang/julia/issues/15648, but I like how this turns the problem on its head: instead of trying to wedge sparse matrices into a generic API, we simply "lift" dense array support onto abstract sparse APIs. This seems far more tractable.
Some of this abstract sparse API is already in place for broadcasting; it'd be interesting to see what the performance difference would be were we to emulate sparse APIs for structured and dense matrices instead of converting them to actual AbstractSparseArrays.
Oh I think this is row I think nzrange doesn't quiet do this?
But the point stands
I think what I ment to have is:
colinds(A::AbstractMatrix) = indices(A,2)
rowinds(A::AbstractMatrix, col::Integer) = indices(A,1)
rowinds(A::SparseMatrixCSC, col::Integer) = rowvals(A)[nzrange(A, col)]
That API seems to be tied pretty closely to CSC layout. How about an each_nonstructural_index (name subject to bikeshedding, of course)? Iterating over its return value would need to produce indices that allow efficient getindex but are still (reasonable efficiently) convertible to LinearIndex/CartesianIndex if they aren't one of those already.
So the following should be reasonably efficient and leave A==B:
B = zeros(size(A))
for i in each_nonstructural_index(A)
B[LinearIndices(A)[i]] = A[i]
end
you're right, my proposal would not work for say CRC. Or any row major format really. But we do not have such a thing. So maybe it does not matter.
Take a look at the example of use in the Chinese Whisphers clustering.
Does that translate sanely to something like
colinds, rowinds = zip(each_nonstructural_index(A)...) ?
(I've not thought too hard if it does or not. I think it does not -- at least not that simply)
it'd be interesting to see what the performance difference would be were we to emulate sparse APIs for structured and dense matrices instead of converting them to actual AbstractSparseArrays.
This was my eventual hope with the sparse broadcast infrastructure. Perhaps in a few months... :)
Somewhat related is JuliaLang/julia#15648, but I like how this turns the problem on its head: instead of trying to wedge sparse matrices into a generic API, we simply "lift" dense array support onto abstract sparse APIs. This seems far more tractable.
For what it's worth, I came up with the same idea/need when trying to implement efficient sparse-sparse, dense-sparse, and sparse-dense outer products for my own needs (and tried to make an attempt at something committable in JuliaLang/julia#24980, but that's obviously fallen far behind in all the linear algebra and broadcasting changes since then).
So I guess with respect to
That API seems to be tied pretty closely to CSC layout. How about an each_nonstructural_index (name subject to bikeshedding, of course)?
I'm in favor of having a portable way of doing CSC-like operations on dense arrays given the fact that CSC-aware algorithms can easily consume dense arrays that way for free (and then a more generic sparse infrastructure can be added later).
I guess we can have
- Maximally abstract:
indicies, and maybe anonstructual_indicieslike @martinholters suggests - Moderately abstract Column major methods that I proposed above (works for CSC/Dense)
- (Possibly could exend those with a flag that makes them work row major? Or just define then names to not say row or column...) - Minimally abstract:
nzrange,rowvals(only CSC)