mathnet-numerics
mathnet-numerics copied to clipboard
SVD routine for sparse matrices (to handle huge matrices like 500Kx500K)
Hello,
I would like to suggest to implement an SVD routine for sparse matrices. Main motivation is recommender system algorithms: Such algorithms deal with initialization of input matices (which are products of SVD) having huge data array (such as 500 000 x 500 000) of sparse data. The resulting SVD matrices are 4*500Kx500K (Dense). As a result we cannot fit all these data volume in the memory and therefore we cannot use SVD from Math.Net for such sparse matrices.
Few more discussions on the topic were here: https://mathnetnumerics.codeplex.com/discussions/441313
Best regards, Alex.
Just for reference, some existing sparse SVD routines with compatible license:
- http://tedlab.mit.edu/~dr/SVDLIBC/ (C)
- http://soi.stanford.edu/~rmunk/PROPACK/index.html (Fortran)
Thank you for the reference! (and for the attention as well) C is C and not C#, it seems a lot of testing/debugging will be needed to rewrite the code. I've googled quite a lot and did not find any C# ready libraries that do SVD routine sparse matrices:(
It looks like SVDLIBC is a fork of SVDPACKC but only includes one algorithm from it - the one that seems to be the "best" general version if I read this right. It should be easy to port and parallelize, and is BSD licensed.
I'll attempt a port next month, but anyone else feel free to.
Hi, Thank you! I am really looking forward for it! My skills are not so good to contribute here, however I can test the solution in different scales and different number of cores. Best regards, Alex.
@AlexKNM I just picked a new contract and with a vacation coming up, it will be awhile before get to this.
Hello, I am wondering if you have any suggestions for how to handle sparse matrix SVDs. I know this issue has been open for a while, and I'm just looking for any more modern ways to handle it (preferably in C# or .NET compatible).
The SVD calculation in Math.NET does not seem to handle sparse matrices in any performant way, so I've been using dense matrices, but it's taking too much memory for empty space (99.99% 0's). For reference, my test-run matrix is approximately 20,000x20,000 doubles.
Control.UseNativeMKL();
Control.MaxDegreeOfParallelism = Environment.ProcessorCount; // # of logical cores
MathNet.Numerics.LinearAlgebra.Factorization.Svd<double> svd = myMatrix.Svd(true); // <-- this line is slow if myMatrix is Sparse. Runs fine if it's Dense.
Thank you!