GLM.jl Deviance NaN from first iteration onwards with Gamma distributed GLM with LogLink

I am trying to run a Gamma distributed GLM with a LogLink function, however, the deviance and diff.dev are both NaN from the first iteration. I have added the data files below (hopefully that's worked correctly - they are pretty small):

x.txt y.txt

Reproducible example using data above:

using DelimitedFiles, GLM
X = readdlm("x.txt")
y = readdlm("y.txt")
y = reshape(y, 1000)  # Otherwise is a 1000 × 1 Matrix not a vector
glm(X, y, Gamma(), LogLink(), maxiter=5, verbose = true)

The output I get:

Iteration: 1, deviance: NaN, diff.dev.:NaN
Iteration: 2, deviance: NaN, diff.dev.:NaN
Iteration: 3, deviance: NaN, diff.dev.:NaN
Iteration: 4, deviance: NaN, diff.dev.:NaN
Iteration: 5, deviance: NaN, diff.dev.:NaN

Running this for any amount of iterations leads to the error:

failure to converge after 5 iterations.
_fit!(m::GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Gamma{Float64}, LogLink}, GLM.DensePredChol{Float64, Cholesky{Float64, Matrix{Float64}}}}, verbose::Bool, maxiter::Int64, minstepfac::Float64, atol::Float64, rtol::Float64, start::Nothing) at glmfit.jl:339
#fit!#12 at glmfit.jl:372 [inlined]
fit! at glmfit.jl:352 [inlined]
fit(::Type{GeneralizedLinearModel}, X::Matrix{Float64}, y::Vector{Float64}, d::Gamma{Float64}, l::LogLink; dofit::Bool, wts::Vector{Float64}, offset::Vector{Float64}, fitargs::Base.Iterators.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, NamedTuple{(:maxiter, :verbose), Tuple{Int64, Bool}}}) at glmfit.jl:468
(::StatsBase.var"#fit##kw")(::NamedTuple{(:maxiter, :verbose), Tuple{Int64, Bool}}, ::typeof(fit), ::Type{GeneralizedLinearModel}, X::Matrix{Float64}, y::Vector{Float64}, d::Gamma{Float64}, l::LogLink) at glmfit.jl:462
glm(::Matrix{Float64}, ::Vector{Float64}, ::Gamma{Float64}, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, NamedTuple{(:maxiter, :verbose), Tuple{Int64, Bool}}}) at glmfit.jl:484
(::GLM.var"#glm##kw")(::NamedTuple{(:maxiter, :verbose), Tuple{Int64, Bool}}, ::typeof(glm), ::Matrix{Float64}, ::Vector{Float64}, ::Gamma{Float64}, ::Vararg{Any, N} where N) at glmfit.jl:484
top-level scope at sandbox.jl:229
eval at boot.jl:360 [inlined]

I will try and dig into this and narrow it down a bit further but I am not very familiar with GAMs, so any help would be much appreciated!

Apr 08 '21 15:04 danielward27

The NaNs seem to be introduced in updateμ!(r::GlmResp{V,D,L}) where {V<:FPVector,D,L}. Specifically line 105 μi, dμdη = inverselink(L(), η[i]) introduces NaNs if η is too large.

Apr 08 '21 16:04 danielward27

I think the issue is just that it does exp (inverse of log) and the values are too great leading to Infs. I presume I'll be able to fix this by scaling my variables. The behaviour to keep looping through all the iterations and throwing an error saying "failure to converge after x iterations" probably could be clearer though?

Apr 08 '21 17:04 danielward27

Hi there,

I also face a similar problem with a Bernoulli/LogitLink model. It seems the source of the NaNs originates from the delbeta! method at least in mul!(p.delbeta, transpose(scr), r).

Do you have any idea of what could be causing the problem? Happy to provide more information. I can probably also provide the data to reproduce the problem if that helps. If so, please let me know how to do this best.

Thanks!

Oct 23 '22 11:10 olivierlabayle

GLM.jl GLM.jl copied to clipboard

Deviance NaN from first iteration onwards with Gamma distributed GLM with LogLink

GLM.jl
GLM.jl copied to clipboard