rfunctions geninv on sparseMatrix

First of all, thank you very much for this great package! I've found it by looking for alternate solutions to MASS:ginv for the Moore-Penrose generalized inverse. I have rather large spare matrices, and I wondered, as rfunctions already depends on Matrix, if it's possible to extend the package to be able to make geninv compatible with sparse matrices?

A quick example of such matrix:

library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
    set.seed(i)
    m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m

May 27 '14 09:05 daroczig

Sure, I should be able to do that. It might be a couple of weeks, as I've got some other things going on. Just so you know, the paper from which this method is derived has some flawed theory (http://arxiv.org/ftp/arxiv/papers/0804/0804.4809.pdf). There are certain instances when it gives wrong answers. I should put a warning up about that. I've been looking into alternative algorithms but it's a low priority for me

On 05/27/14, Gergely Daróczi wrote:

First of all, thank you very much for this great package! I've found it by looking for alternate solutions to MASS:ginv for the Moore-Penrose generalized inverse. I have rather large spare matrices, and I wondered, as rfunctions already depends on Matrix, if it's possible to extend the package to be able to make geninv compatible with sparse matrices?

A quick example of such matrix:

library(Matrix) n <- 1e5 m <- sparseMatrix(1:n, 1:n, x = 1) m <- do.call(rBind, lapply(1:10, function(i) { set.seed(i) m[-sample(1:n, n/3), ] })) m <- t(m) %*% m

— Reply to this email directly or view it on GitHub(https://github.com/jaredhuling/rfunctions/issues/1).

Jared Huling Graduate Student Department of Statistics University of Wisconsin-Madison

May 27 '14 17:05 jaredhuling

Awesome, thank you @jaredhuling! The described extremely fast algorithm with the low probability of errors is a great trade-off, and it's highly welcomed here. I will keep tabs on rfunctions, and I am looking forward to see the updates on making it compatible with sparse matrices. Please let me know if I can help you with that process.

May 28 '14 11:05 daroczig

I added sparse matrix support (namely SparseMatrix format from the Matrix package). Note that while this is faster for sparse matrices, the returned inverse will in general not be sparse. Inverting a sparse matrix rarely results in a sparse matrix, so you might not see much benefit overall

Jun 15 '14 20:06 jaredhuling

It seems that this algorithm tends to given incorrect results more frequently for sparse matrices. I don't recommend using it. I will replace it when I find a good alternative

Jun 15 '14 21:06 jaredhuling

I got some great hints from Gabor Grothendieck on SO to drop all the rows and columns consisting of zeros in the sparse matrix before computing the inverse, which is a very nifty hack indeed. It worked pretty fine here, e.g.:

> m <- matrix(0, 5, 5)
> for (i in 1:5) {
+     set.seed(i+3)
+     x <- sample(1:5, 1)
+     m[x, x] <- i
+ }
> ix <- union(which(rowSums(m) > 0), which(colSums(m) > 0))
> minv <- m
> minv[ix, ix] <- ginv(m[ix, ix])
> minv
     [,1] [,2] [,3]      [,4] [,5]
[1,]    0  0.0  0.0 0.0000000 0.00
[2,]    0  0.5  0.0 0.0000000 0.00
[3,]    0  0.0  0.2 0.0000000 0.00
[4,]    0  0.0  0.0 0.3333333 0.00
[5,]    0  0.0  0.0 0.0000000 0.25
> geninv(m)
     [,1] [,2] [,3]      [,4] [,5]
[1,]    0  0.0  0.0 0.0000000 0.00
[2,]    0  0.5  0.0 0.0000000 0.00
[3,]    0  0.0  0.2 0.0000000 0.00
[4,]    0  0.0  0.0 0.3333333 0.00
[5,]    0  0.0  0.0 0.0000000 0.25

Which suggests that the inverse of sparse matrices will be sparse. Nope?

I will test your algorithm in the near future, thank you very much!

Jun 15 '14 21:06 daroczig

sure, there are definitely cases where the inverse of a sparse matrix is sparse, but that's not true for many cases (most I've come across)

library(rfunctions)
#create matrix with 10% nonzero values
x <- simSparseMatrix(0.9, dim = c(50, 50))
geninv(x)
# note that it's not sparse

Jun 15 '14 23:06 jaredhuling

Yeah, thanks God I deal with much sparser matrices with less than 0.01 percent of non-zero values (although with a few tens and even hundreds of thousands of rows/columns). I think this solution will just work fine with tasks, but I will report back again after some decent testing.

Jun 15 '14 23:06 daroczig

For that size geninv() as it currently stands won't be very memory-efficient. I'm working on a way to fix that

Jun 15 '14 23:06 jaredhuling

I think for problems of your size you will want to use this library http://crd-legacy.lbl.gov/~xiaoye/SuperLU/

Jun 16 '14 06:06 jaredhuling