GLMNet.jl
GLMNet.jl copied to clipboard
Allow GLM style specification of Bernoulli outcomes for logistic regression
I often like to use glm(X, y)
where y
is a vector of 0's and 1's and thought it would be nice to offer the same interface for working with glmnet
. This patch makes some quick changes to allow that to happen. Let me know if you'd like me to handle the changes in another way since I didn't spend a lot of time thinking about the cleanest way to add this functionality.
The script below gives a basic demo of the extended functionality. I can make it into a test:
using GLMNet
using Distributions
using Base.Test
srand(1)
invlogit(z::Real) = 1 / (1 + exp(-z))
n, p = 250_000, 2
intercept = randn()
beta = randn(p)
X = randn(n, p)
y = X * beta
for i in 1:n
y[i] = rand(Bernoulli(invlogit(intercept + y[i])))
end
path = glmnet(X, y, Binomial())
@test abs(intercept - path.a0[end]) < 0.1
@test norm(beta - convert(Matrix{Float64}, path.betas)[:, end]) < 0.1
I agree that this is an API worth having. I had thought about this previously, but got bogged down in implementaiton. If X
has a lot of duplicate rows, then I think the model fitting process would be faster if we pool the duplicate rows before calling lognet
. This boils down to finding the unique rows of X
. Doing this without allocation for each row seems possible but required more code than I was prepared to write at the time. Let's start with this approach and we can worry about efficiency later.
I'm going to come back to this tomorrow. I've had some problems getting the solver to converge recently and need to delve deeper.