Lasso.jl icon indicating copy to clipboard operation
Lasso.jl copied to clipboard

Lasso.jl for big-ish data

Open CorySimon opened this issue 8 years ago • 2 comments

My entire design matrix cannot fit in memory. Much like SGDRegressor.partial_fit() in scikit-learn (see here), can I use Lasso.jl to fit in epochs, feeding batches of data at a time? I realize that this will likely not converge to the same parameters as if the data could all fit in memory.

Maybe one way to train in batches would be to modify criterion in fit() to stop after a certain number of iterations?

CorySimon avatar Mar 03 '17 22:03 CorySimon

Did you solve your problem? I do not know if this helps, as it has been quite sometime. But you can parallelize using ADMM. You refit Lasso on parts of the data iteratively.

rakeshvar avatar Jul 16 '18 00:07 rakeshvar

Here's an implementation of ADMM to get you started in case you still need. https://github.com/baggepinnen/LPVSpectral.jl/blob/724561469a483aa1ffae6fa76b73c67ed2becce7/src/lasso.jl#L118

The functions above specify the prox operators that are inputs to ADMM to solve the LASSO problem

Used like this

using LPVSpectral, ProximalOperators
A = randn(70,100);
x = randn(100) .* (rand(100) .< 0.05);
y = A*x;
proxF = LeastSquares(A,y)
xh,zh = LPVSpectral.ADMM(randn(100), proxF, NormL1(3))
[x zh]

baggepinnen avatar Feb 19 '20 04:02 baggepinnen