Lasso.jl
Lasso.jl copied to clipboard
Divergence from GLMnet when using a matrix with many variables
When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.
An example to illustrate this:
using Lasso, GLMNet, Statistics
# fits identical models in Lasso and GLMNet from mock data
# and returns the mean absolute difference of the betas of both models
function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
data = rand(nrow, ncol)
outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
presence_matrix = [1 .- outcome outcome]
l = Lasso.fit(LassoPath, data, outcome, Binomial())
g = GLMNet.glmnet(data, presence_matrix, Binomial())
lcoefs = Vector(l.coefs[:,end])
gcoefs = g.betas[:, end]
mean(abs, lcoefs .- gcoefs)
end
# 1000 records, 5 variables that all contribute to outcome
lasso_glmnet_dif(1000, 5, 5) # order of magnitude 1e-9
# 1000 records, 100 variables of which 5 contribute to the outcome
lasso_glmnet_dif(1000, 1000, 5) # around 0.05
The context for this problem is that I'm working on a julia implementation of maxnet, where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.
The packages may be generating a different path of regularization lambdas.
You can get them from the lasso path as l.λ
, and from GLMNet as g.lambda
.
Also, in your example it looks like you are picking the last coefs of the regularization path, but those are not necessarily the most interesting ones. Take a look at the docs here.
Hi, thanks for chipping in.
I don't think the regularization lambdas is where the differences are coming from. In the maxnet algorithm the lambdas are generated inside the algorithm and not by the packages. Even if I force the lambdas to be identical, I see the same kind of behaviour.
The same goes for when I look at some other part of the regularization path (maxnet always takes the last one, but I see your point).
E.g. in this example I take coefficients halfway in the path and force the lambdas to be identical, and lasso_glmnet_dif(1000, 1000, 5)
is still around 0.02.
That seems like a big difference for it to come from floating point errors, which leads me to think the algorithms are somehow different?
function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
data = rand(nrow, ncol)
outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
presence_matrix = [1 .- outcome outcome]
l = Lasso.fit(LassoPath, data, outcome, Binomial())
g = GLMNet.glmnet(data, presence_matrix, Binomial(); lambda = l.λ)
lcoefs = Vector(l.coefs[:,floor(Int,end/2)])
gcoefs = g.betas[:, floor(Int,end/2)]
mean(abs, lcoefs .- gcoefs)
end