geninv on sparseMatrix
First of all, thank you very much for this great package! I've found it by looking for alternate solutions to MASS:ginv for the Moore-Penrose generalized inverse. I have rather large spare matrices, and I wondered, as rfunctions already depends on Matrix, if it's possible to extend the package to be able to make geninv compatible with sparse matrices?
A quick example of such matrix:
library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
set.seed(i)
m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m
Sure, I should be able to do that. It might be a couple of weeks, as I've got some other things going on. Just so you know, the paper from which this method is derived has some flawed theory (http://arxiv.org/ftp/arxiv/papers/0804/0804.4809.pdf). There are certain instances when it gives wrong answers. I should put a warning up about that. I've been looking into alternative algorithms but it's a low priority for me
On 05/27/14, Gergely Daróczi wrote:
First of all, thank you very much for this great package! I've found it by looking for alternate solutions to MASS:ginv for the Moore-Penrose generalized inverse. I have rather large spare matrices, and I wondered, as rfunctions already depends on Matrix, if it's possible to extend the package to be able to make geninv compatible with sparse matrices?
A quick example of such matrix:
library(Matrix) n <- 1e5 m <- sparseMatrix(1:n, 1:n, x = 1) m <- do.call(rBind, lapply(1:10, function(i) { set.seed(i) m[-sample(1:n, n/3), ] })) m <- t(m) %*% m
— Reply to this email directly or view it on GitHub(https://github.com/jaredhuling/rfunctions/issues/1).
Jared Huling Graduate Student Department of Statistics University of Wisconsin-Madison
Awesome, thank you @jaredhuling! The described extremely fast algorithm with the low probability of errors is a great trade-off, and it's highly welcomed here. I will keep tabs on rfunctions, and I am looking forward to see the updates on making it compatible with sparse matrices. Please let me know if I can help you with that process.
I added sparse matrix support (namely SparseMatrix format from the Matrix package). Note that while this is faster for sparse matrices, the returned inverse will in general not be sparse. Inverting a sparse matrix rarely results in a sparse matrix, so you might not see much benefit overall
It seems that this algorithm tends to given incorrect results more frequently for sparse matrices. I don't recommend using it. I will replace it when I find a good alternative
I got some great hints from Gabor Grothendieck on SO to drop all the rows and columns consisting of zeros in the sparse matrix before computing the inverse, which is a very nifty hack indeed. It worked pretty fine here, e.g.:
> m <- matrix(0, 5, 5)
> for (i in 1:5) {
+ set.seed(i+3)
+ x <- sample(1:5, 1)
+ m[x, x] <- i
+ }
> ix <- union(which(rowSums(m) > 0), which(colSums(m) > 0))
> minv <- m
> minv[ix, ix] <- ginv(m[ix, ix])
> minv
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0.0 0.0 0.0000000 0.00
[2,] 0 0.5 0.0 0.0000000 0.00
[3,] 0 0.0 0.2 0.0000000 0.00
[4,] 0 0.0 0.0 0.3333333 0.00
[5,] 0 0.0 0.0 0.0000000 0.25
> geninv(m)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0.0 0.0 0.0000000 0.00
[2,] 0 0.5 0.0 0.0000000 0.00
[3,] 0 0.0 0.2 0.0000000 0.00
[4,] 0 0.0 0.0 0.3333333 0.00
[5,] 0 0.0 0.0 0.0000000 0.25
Which suggests that the inverse of sparse matrices will be sparse. Nope?
I will test your algorithm in the near future, thank you very much!
sure, there are definitely cases where the inverse of a sparse matrix is sparse, but that's not true for many cases (most I've come across)
library(rfunctions)
#create matrix with 10% nonzero values
x <- simSparseMatrix(0.9, dim = c(50, 50))
geninv(x)
# note that it's not sparse
Yeah, thanks God I deal with much sparser matrices with less than 0.01 percent of non-zero values (although with a few tens and even hundreds of thousands of rows/columns). I think this solution will just work fine with tasks, but I will report back again after some decent testing.
For that size geninv() as it currently stands won't be very memory-efficient. I'm working on a way to fix that
I think for problems of your size you will want to use this library http://crd-legacy.lbl.gov/~xiaoye/SuperLU/