Sparse matrix support?
Do any of the supported core.matrix implementations cover sparse matrices? If not, what would you recommend for inclusion?
Sparse matrix support is at an early stage right now. We still need to do some design work to figure out the best way to add this to the API.
As far as implementations go:
- vectorz-clj (https://github.com/mikera/vectorz-clj) has some sparse matrix support, though you'll need to use some Java interop to create the right instances (via Matrixx.createSparse and friends). I haven't yet got round to exposing this functionality via a Clojure API
- core.matrix itself has some experimental support for treating a Clojure map as a sparse matrix (i.e. a map of index vector -> value), though it's not very efficient
P.S. Was there a particular use case you had in mind?
I've found that the actual overhead of sparse matrices often doesn't justify the theoretical benefits, unless your matrices are really big and very sparse.... (e.g. matrices representing connections in graphs with very many nodes and very few edges per node)
Generally, I'd tend to use a sparse implementation if:
(a) the matrix is large (e.g. bigger than 500 x 500), and (b) the number of nonzero entries scales as O(N) (as opposed to the O(N^2) growth of a dense matrix) or some other power that's significantly sub-2.
For small matrices, I'd rather use a robust dense implementation in most circumstances. But at 100k+ rows, the memory footprint of a dense matrix-- e.g. 10 billion floats when you only have ~500k non-zero entries-- becomes unacceptable.
For example, convolutions (for smoothing) tend to involve "banded" matrices that are only nonzero near the diagonal, e.g. a tridiagonal smoothing matrix
| [3 1 0 0 0 ... 0 0] | [1 2 1 0 0 ... 0 0] | [0 1 2 1 0 ... 0 0] | [0 0 1 2 1 ... 0 0] | [. . . . . ... . .] | [0 0 0 0 0 ... 2 1] | [0 0 0 0 0 ... 1 3]
That fits the O(N) criteria, since the number of nonzeroes is 3*N-2. At N = 100k, you'd really want a sparse implementation.
Without that, you can usually avoid creating the matrix explicitly and fold the linear transformation into what it acts upon, but ideally a sparse matrix implementation would be faster than that (just as Lapack's offerings are typically orders of magnitude faster than hand-rolled matrix operations).
Great, thanks for clarifying your use cases.
The functions on specialised matrices can be supported by the regular core.matrix protocols, so inner-product on diagonal matrices should be O(n) for example. This is of course dependent on the underlying implementation.
As mentioned, vectorz-clj already has support for various types of specialised matrices, including diagonal matrices, general sparse matrices and permutation matrices. No tridiagonal matrices yet, but those and other banded matrices should be fairly easy to add.
There are still a few other things that need to be done on the core.matrix side though to get this all fully supported:
- Sparse matrix construction functions (maybe using index vector -> value maps?)
- Relevant predicates e.g.
sparse? - Functions like
density - Banded matrix functions:
upper-bandwidth,bandetc.
Contributions in these areas are of course all very welcome!
Update: I've added some experimental banded matrix support in Vectorz. Should be available in the next release of vectorz-clj. See:
https://github.com/mikera/vectorz/blob/develop/src/main/java/mikera/matrixx/impl/BandedMatrix.java
I see there are some sparse matrix classes in Vectorz. Are you planning on adding them and other sparse matrix types to the KNOWN_IMPLEMENTATIONS map so that users can explicitly require a sparse implementation. AFAICT the only current options are: create a :persistent-map or create a non-sparse matrix and convert it with a call to sparse.
Related to this, it'd be convenient to have a function for getting all the non-zero indices of a matrix.
Thanks for the work you've done on improving linear algebra support in Clojure.
"Related to this, it'd be convenient to have a function for getting all the non-zero indices of a matrix." My current project could make use of this if it existed, fwiw.
Yes, I'm definitely planning to expose the sparse matrix support from Vectorz into core.matrix.
The work to be done is roughly as follows:
- Get the right API and protocols in core.matrix for creating and manipulating sparse matrices
- Hook up the right Vectorz classes in vectorz-clj
- Do lots of testing - sparse matrices have very different performance characteristics and there will be a lot of optimisations to do
Help with any of the above would be greatly appreciated (especially if you want to use Sparse matrices in the near future! It's much better to implement and test this stuff with a real use case)
@mars0i - regarding getting the non-zero indices, what format would you like these? A sequential list of [i j] pairs in row-major order?
Just trying to figure out the best API that is both useful and efficient...
@mattrepl - P.S. You don't actually need to add these classes to KNOWN_IMPLEMENTATIONS since these classes are already part of the :vectorz implementation.
The tricky bit is constructing them - that's where we need a bit more thinking on the API
Thanks, I'll work on a branch and see what I come up with.
On Feb 19, 2014, at 21:49, Mike Anderson [email protected] wrote:
@mattrepl - P.S. You don't actually need to add these classes to KNOWN_IMPLEMENTATIONS since these classes are already part of the :vectorz implementation.
The tricky bit is constructing them - that's where we need a bit more thinking on the API
— Reply to this email directly or view it on GitHub.
New issue for the non-zero index support:
https://github.com/mikera/core.matrix/issues/102