GLMNet.jl
GLMNet.jl copied to clipboard
Logistic regression fails if y is a string of vectors
From README:
For logistic models, y is either a string vector or a m x 2 matrix
But the following doesn't work
using GLMNet
y = ["M", "B", "M", "B"]
X = rand(4, 10)
glmnet(X, y, Binomial())
MethodError: no method matching glmnet(::Matrix{Float64}, ::Vector{String}, ::Binomial{Float64})
Closest candidates are:
glmnet(::AbstractMatrix{T} where T, ::AbstractVector{T} where T, ::AbstractVector{T} where T) at /home/users/bbchu/.julia/packages/GLMNet/C8WKF/src/CoxNet.jl:151
glmnet(::AbstractMatrix{T} where T, ::AbstractVector{T} where T, ::AbstractVector{T} where T, ::CoxPH; kw...) at /home/users/bbchu/.julia/packages/GLMNet/C8WKF/src/CoxNet.jl:151
glmnet(::Matrix{Float64}, ::Vector{Float64}, ::Distribution; kw...) at /home/users/bbchu/.julia/packages/GLMNet/C8WKF/src/GLMNet.jl:485
...
Fortunately if y
is a matrix with 2 columns, it does work
y = [1 0; 0 1; 0 1; 1 0]
X = rand(4, 10)
glmnet(X, y, Binomial())
Logistic GLMNet Solution Path (100 solutions for 10 predictors in 833 passes):
────────────────────────────────
df pct_dev λ
────────────────────────────────
[1] 0 0.0 0.476672
[2] 1 0.0582906 0.455006
[3] 1 0.11166 0.434325
[4] 1 0.160737 0.414585
[5] 1 0.206039 0.395741
[6] 1 0.248 0.377754
[7] 1 0.286986 0.360585
...
It looks like the method that supports the string-vector input is this one:
https://github.com/JuliaStats/GLMNet.jl/blob/8eff4c4f07374c6f6f7878b16dc02e90d444e9a1/src/Multinomial.jl#L191-L203
So this works:
using GLMNet
y = ["M", "B", "M", "B"]
X = rand(4, 10)
glmnet(X, y)
The reason it doesn't need a distribution is because it chooses between Binomial and Multinomial based on the number of unique values in y
. This method could probably be extended to support passing a distribution, and I guess throwing an error if the distribution and y
are incompatible.
At the very least the README should be updated