biglasso icon indicating copy to clipboard operation
biglasso copied to clipboard

Sparse matrices

Open dselivanov opened this issue 8 years ago • 9 comments

Great work. Is it possible to extend package to allow sparse matrices as input?

dselivanov avatar Jan 05 '17 21:01 dselivanov

Thanks for your comment. Yes, that is definitely on my to-do list. I am currently busy with research papers. Hopefully I will be able to add this feature by the end of next month.

YaohuiZeng avatar Jan 06 '17 03:01 YaohuiZeng

Also it would be great to have benchmarks against SGD based optimizations. Here https://github.com/dselivanov/FTRL I implemented FTRL algorithm. I found it blazing fast - both convergence speed and runtime per number of non-zero elements in input matrix.

dselivanov avatar Jan 27 '17 15:01 dselivanov

Nice work!How can I convert a sparse matrix to a big.matrix?

hutaohutc avatar Oct 09 '17 12:10 hutaohutc

@hutaohutc Your question is more a bigmemory question than a biglasso one.

privefl avatar Oct 09 '17 12:10 privefl

@hutaohutc As I said, your question will get more attention posting an issue in the bigmemory repo or by asking on Stack Overflow (with tag r-bigmemory).

privefl avatar Oct 09 '17 13:10 privefl

@YaohuiZeng I am also interested in having the package take sparse matrices as an input, as currently my matrix is too big to fit in disk space.

mm3509 avatar Feb 12 '18 11:02 mm3509

What's the kind of data you have? Dimension? If your data is very sparse, it means that you have variables with very low variability. Maybe try to identify only columns with some decent variability and put those in a big.matrix.

What do you think?

privefl avatar Feb 12 '18 15:02 privefl

I want to estimate a vector auto-regression (VAR) on 10 years (120 months) of US county-level data. I have 120 time periods * 3,118 mainland counties = 374,160 data points. If I were estimating the VAR county by county, I would select one county, run a LASSO regression on all the others, and obtain 3,118 ^ 2 parameters (3,118 parameters for each county on the left-hand side). But I want to estimate it in one go, so I have a matrix of explanatory variables with 1 million regressors, which weighs about 27 TB (although it's 99.997% sparse). The reason to estimate it in one go is that later I want to impose a certain structure on the variance-covariance matrix of errors and do a kind of Feasible Generalized Least Squares.

So, yes, I could find work-arounds to estimate some approximation of the VAR parameters. But I really want to run the whole thing at once...

mm3509 avatar Feb 13 '18 17:02 mm3509

@miguelmorin Take a look on this new package for estimate VAR models http://www.wbnicholson.com/BigVAR.html

TuSKan avatar Sep 18 '18 02:09 TuSKan