Lasso.jl for big-ish data
My entire design matrix cannot fit in memory.
Much like SGDRegressor.partial_fit() in scikit-learn (see here), can I use Lasso.jl to fit in epochs, feeding batches of data at a time? I realize that this will likely not converge to the same parameters as if the data could all fit in memory.
Maybe one way to train in batches would be to modify criterion in fit() to stop after a certain number of iterations?
Did you solve your problem? I do not know if this helps, as it has been quite sometime. But you can parallelize using ADMM. You refit Lasso on parts of the data iteratively.
Here's an implementation of ADMM to get you started in case you still need. https://github.com/baggepinnen/LPVSpectral.jl/blob/724561469a483aa1ffae6fa76b73c67ed2becce7/src/lasso.jl#L118
The functions above specify the prox operators that are inputs to ADMM to solve the LASSO problem
Used like this
using LPVSpectral, ProximalOperators
A = randn(70,100);
x = randn(100) .* (rand(100) .< 0.05);
y = A*x;
proxF = LeastSquares(A,y)
xh,zh = LPVSpectral.ADMM(randn(100), proxF, NormL1(3))
[x zh]