Gamma models convergence checks and run time
I am trying to fit a Gamma model to a reaction time data (there is a recent paper that argues for not transforming RTs and using GLMM with Gamma or Inverse Gaussian --> http://journal.frontiersin.org/article/10.3389/fpsyg.2015.01171/full )
I have a few question about running GLMMs in Julia, but before asking them I will go over a reproducible example using a known dataset (my dataset is large and takes too long to run). I am new to GLMMs and Julia, so I might have done something wrong. Let me know if that's the case.
Using Baayen's data from langugaeR package in R, I ran 4 models, 3 out of which do not make sense for this data. There are number of variables, but I will focus on the native language (English and other) and word class (Animal and Plant). Also, I've converted RTs to raw RTs, as in the dataset they are log-ed.
using DataArrays, DataFrames , MixedModels, GLM, RDatasets, RData
#Baayen's data from langugaeR package in R saved as .csv from R
lexdec=readtable("lexdec.csv", makefactors=true)
lexdec_1 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class*NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
ERROR: MethodError: no method matching getindex(::DataFrames.DataFrame, ::Expr)
lexdec_2 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
lexdec_3 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
lexdec_4 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
Lexdec_1 did not run. I suppose the interaction term is not supported in the random effects. Lexdec_2, 3, 4 ran without a problem. However, Lexdec_2 and 3 do not make sense for this data as random slopes for native language are inappropriate here (each person has only one language background, so there's no data to estimate how that person would behave if they had the other language background). Only Lexdec_4 is a valid model here. Nevertheless, all models except lexdec_1 ran just fine. I also ran the models using lmm in Julia with identical results on convergence.
For comparison I ran Gamma models in R using lme4 and glmmTMB. glmer from lme4 sucsesfully ran all 4 models (including the interaction model). glmmTMB gave convergence warnings on Lexdec_1, 2, and 4, while no warning on Lexdec_3.
I observed different results when running lmm models in R using lme4 and glmmTMB. lmer from lme4 gave warnings on Lexdec_1 and 3, while glmmTMB gave warnings on Lexdec_1, 2, and 3 (was glmmTMB able to catch the inappropriate models?)
My questions:
- Are there any convergence checks? I can run any model with any parameters without getting any convergence warnings.
- Are interactions of random effects supported?
- What can I do to decrees runtime? My dataset has ~20k observations and there are 2 within factors with 3 levels each and one between factor with 2 levels.
- Are there any plans to implement support for Inverse Gaussian? I ran Inverse Gaussian models using lme4 and glmmTMB with identical results, so I guess there is not much of difference from RT data.
I am running arch linux 4.12.3-1-ARCH with openblas-lapack installed from AUR.
julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (HASWELL)
LAPACK: liblapack
LIBM: libm
LLVM: libLLVM-4.0.0 (ORCJIT, haswell)
julia> Pkg.status()
- MixedModels 0.10.0
- GLM 0.7.0
Thank you for opening what is indeed an interesting issue. I can see that the results from lme4 and from this package do not coincide. What I don't know is which one, if either, is correct.
It would probably help if @bbolker could take a look at this too.
Just to make sure that we are on the same page, when I copy the lexdec data from the languageR package, I don't have an rt_raw column.
julia> using RCall
julia> lexdec = rcopy(R"languageR::lexdec");
julia> dump(lexdec)
DataFrames.DataFrame 1659 observations of 28 variables
Subject: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["A1", "A1", "A1", "A1"]
RT: DataArrays.DataArray{Float64,1}(1659) [6.34036, 6.3081, 6.34914, 6.18621]
Trial: DataArrays.DataArray{Int32,1}(1659) Int32[23, 27, 29, 30]
Sex: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["F", "F", "F", "F"]
NativeLanguage: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["English", "English", "English", "English"]
Correct: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["correct", "correct", "correct", "correct"]
PrevType: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["word", "nonword", "nonword", "word"]
PrevCorrect: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["correct", "correct", "correct", "correct"]
Word: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["owl", "mole", "cherry", "pear"]
Frequency: DataArrays.DataArray{Float64,1}(1659) [4.85981, 4.60517, 4.99721, 4.72739]
FamilySize: DataArrays.DataArray{Float64,1}(1659) [1.38629, 1.09861, 0.693147, 0.0]
SynsetCount: DataArrays.DataArray{Float64,1}(1659) [0.693147, 1.94591, 1.60944, 1.09861]
Length: DataArrays.DataArray{Int32,1}(1659) Int32[3, 4, 6, 4]
Class: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["animal", "animal", "plant", "plant"]
FreqSingular: DataArrays.DataArray{Int32,1}(1659) Int32[54, 69, 83, 44]
FreqPlural: DataArrays.DataArray{Int32,1}(1659) Int32[74, 30, 49, 68]
DerivEntropy: DataArrays.DataArray{Float64,1}(1659) [0.7912, 0.6968, 0.4754, 0.0]
Complex: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["simplex", "simplex", "simplex", "simplex"]
rInfl: DataArrays.DataArray{Float64,1}(1659) [-0.310155, 0.814508, 0.518794, -0.427444]
meanRT: DataArrays.DataArray{Float64,1}(1659) [6.3582, 6.415, 6.3426, 6.3353]
SubjFreq: DataArrays.DataArray{Float64,1}(1659) [3.12, 2.4, 3.88, 4.52]
meanSize: DataArrays.DataArray{Float64,1}(1659) [3.4758, 2.9999, 1.6278, 1.9908]
meanWeight: DataArrays.DataArray{Float64,1}(1659) [3.1806, 2.6112, 1.2081, 1.6114]
BNCw: DataArrays.DataArray{Float64,1}(1659) [12.0571, 5.73881, 5.71652, 2.05037]
BNCc: DataArrays.DataArray{Float64,1}(1659) [0.0, 4.06225, 3.2498, 1.46241]
BNCd: DataArrays.DataArray{Float64,1}(1659) [6.1756, 2.85028, 12.5887, 7.36322]
BNCcRatio: DataArrays.DataArray{Float64,1}(1659) [0.0, 0.707856, 0.568493, 0.713242]
BNCdRatio: DataArrays.DataArray{Float64,1}(1659) [0.512198, 0.496667, 2.20217, 3.59117]
julia> R"""packageVersion("languageR")"""
RCall.RObject{RCall.VecSxp}
[1] ‘1.4.1’
I am going to work on the assumption that the RT column here is what you have as rt_raw. If I try to fit a GLM I get similar results in R
julia> R"""summary(glm(RT ~ 1 + Class * NativeLanguage, family=Gamma(link="identity"), data=languageR::lexdec))"""
RCall.RObject{RCall.VecSxp}
Call:
glm(formula = RT ~ 1 + Class * NativeLanguage, family = Gamma(link = "identity"),
data = languageR::lexdec)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.091631 -0.023924 -0.005263 0.017681 0.165656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.313149 0.009814 643.297 <2e-16 ***
Classplant 0.011648 0.014759 0.789 0.4301
NativeLanguageOther 0.174059 0.015228 11.430 <2e-16 ***
Classplant:NativeLanguageOther -0.041166 0.022855 -1.801 0.0719 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.001275883)
Null deviance: 2.3196 on 1658 degrees of freedom
Residual deviance: 2.0737 on 1655 degrees of freedom
AIC: -222.26
Number of Fisher Scoring iterations: 3
and using the GLM package in Julia
julia> using DataFrames, GLM, MixedModels
julia> glm(@formula(RT ~ 1 + Class * NativeLanguage), lexdec, Gamma(), GLM.IdentityLink())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.IdentityLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: RT ~ 1 + Class + NativeLanguage + Class & NativeLanguage
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 6.31315 0.00981374 643.297 <1e-99
Class: plant 0.0116479 0.0147591 0.789201 0.4300
NativeLanguage: Other 0.174059 0.0152283 11.43 <1e-29
Class: plant & NativeLanguage: Other -0.0411659 0.0228546 -1.80121 0.0717
julia> deviance(ans) # I should change the show method for GeneralizedLinearModel to include this
2.0737368935267377
So far, so good. I'm not quite sure why a identity link would be used here. The canonical link for the Gamma() distribution is InverseLink() which doesn't really change that much except for the values of the coefficients, which are now on the inverse scale.
julia> glm(@formula(RT ~ 1 + Class * NativeLanguage), lexdec, Gamma())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: RT ~ 1 + Class + NativeLanguage + Class & NativeLanguage
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 0.1584 0.000246231 643.298 <1e-99
Class: plant -0.000291713 0.000369552 -0.789368 0.4299
NativeLanguage: Other -0.00425004 0.00037039 -11.4745 <1e-29
Class: plant & NativeLanguage: Other 0.000996327 0.000557006 1.78872 0.0737
julia> deviance(ans)
2.0737368935267413
The deviance is essentially the same as before which just indicates that the fitted values are over a very small range so the identity or inverse link give essentially the same predictions.
In fact, we can simplify this model by eliminating the Class factor and its interaction with NativeLanguage.
julia> gm3 = glm(@formula(RT ~ 1 + NativeLanguage), lexdec, Gamma())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: RT ~ 1 + NativeLanguage
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 0.15827 0.000183705 861.547 <1e-99
NativeLanguage: Other -0.00380929 0.000276774 -13.7632 <1e-42
julia> deviance(gm3)
2.078180142969401
If we add Word as a fixed-effects term there isn't much of a difference in the deviance.
julia> gm4 = glm(@formula(RT ~ 1 + NativeLanguage + Word), lexdec, Gamma())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: RT ~ 1 + NativeLanguage + Word
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 0.157104 0.00115407 136.13 <1e-99
NativeLanguage: Other -0.00380861 0.000262157 -14.528 <1e-47
Word: ant 0.00269966 0.00163808 1.64807 0.0993
Word: apple 0.00467417 0.00164851 2.83539 0.0046
Word: apricot 0.00169193 0.00163277 1.03623 0.3001
Word: asparagus -0.00116379 0.00161784 -0.719348 0.4719
Word: avocado 0.000723696 0.0016277 0.444614 0.6566
Word: banana 0.00283276 0.00163878 1.72858 0.0839
Word: bat 0.00308949 0.00164013 1.88368 0.0596
Word: beaver 0.000316319 0.00162556 0.194591 0.8457
Word: bee 0.00333687 0.00164144 2.03289 0.0421
Word: beetroot -0.00144623 0.00161637 -0.894739 0.3709
Word: blackberry 0.000899666 0.00162862 0.552411 0.5807
Word: blueberry 0.000345391 0.00162571 0.212455 0.8318
Word: broccoli 0.000591975 0.00162701 0.363843 0.7160
Word: bunny 0.000577392 0.00162693 0.354897 0.7227
Word: butterfly 0.000649564 0.00162731 0.399165 0.6898
Word: camel 0.00114343 0.00162989 0.701534 0.4830
Word: carrot 0.00203069 0.00163455 1.24235 0.2141
Word: cat 0.00410297 0.00164549 2.49346 0.0127
Word: cherry 0.00221536 0.00163553 1.35452 0.1756
Word: chicken 0.00369529 0.00164333 2.24866 0.0245
Word: clove 0.00012808 0.00162458 0.0788388 0.9372
Word: crocodile -0.000749636 0.00161999 -0.46274 0.6436
Word: cucumber -0.000153309 0.00162311 -0.0944541 0.9247
Word: dog 0.0033922 0.00164173 2.06623 0.0388
Word: dolphin 0.00155248 0.00163204 0.951249 0.3415
Word: donkey 0.00273681 0.00163827 1.67054 0.0948
Word: eagle 0.00155643 0.00163206 0.953659 0.3403
Word: eggplant 0.00128519 0.00163064 0.788151 0.4306
Word: elephant 0.000913973 0.00162869 0.56117 0.5747
Word: fox 0.00247122 0.00163687 1.50972 0.1311
Word: frog 0.0024406 0.00163671 1.49116 0.1359
Word: gherkin -0.0029535 0.00160855 -1.83612 0.0663
Word: goat 0.0039133 0.00164448 2.37965 0.0173
Word: goose 0.00231002 0.00163602 1.41197 0.1580
Word: grape 0.0031909 0.00164067 1.94488 0.0518
Word: gull -0.000653618 0.0016205 -0.403344 0.6867
Word: hedgehog -0.00321096 0.00160722 -1.99784 0.0457
Word: horse 0.00236936 0.00163634 1.44797 0.1476
Word: kiwi 0.00284684 0.00163885 1.73709 0.0824
Word: leek -0.000567066 0.00162095 -0.349836 0.7265
Word: lemon 0.00406159 0.00164527 2.46865 0.0136
Word: lettuce 0.00140547 0.00163127 0.861579 0.3889
Word: lion 0.00324603 0.00164096 1.97813 0.0479
Word: magpie -0.00283215 0.00160918 -1.76 0.0784
Word: melon 0.0015061 0.0016318 0.922966 0.3560
Word: mole 0.000435963 0.00162619 0.268088 0.7886
Word: monkey 0.0030919 0.00164015 1.88514 0.0594
Word: moose 0.00161353 0.00163236 0.988461 0.3229
Word: mouse 0.00278345 0.00163852 1.69876 0.0894
Word: mushroom 0.00236613 0.00163632 1.44601 0.1482
Word: mustard 0.000370361 0.00162585 0.227796 0.8198
Word: olive 0.00340746 0.00164181 2.07543 0.0379
Word: orange 0.00366038 0.00164315 2.22767 0.0259
Word: owl 0.00182726 0.00163349 1.11863 0.2633
Word: paprika -0.00172592 0.00161492 -1.06873 0.2852
Word: peanut 0.0032834 0.00164116 2.00066 0.0454
Word: pear 0.00239641 0.00163648 1.46437 0.1431
Word: pig 0.0043727 0.00164692 2.65508 0.0079
Word: pineapple 0.00074701 0.00162782 0.458902 0.6463
Word: potato 0.00178238 0.00163325 1.09131 0.2751
Word: radish -0.000350782 0.00162208 -0.216255 0.8288
Word: reindeer -0.00230666 0.0016119 -1.43102 0.1524
Word: shark 0.00282996 0.00163876 1.72689 0.0842
Word: sheep 0.00331967 0.00164135 2.02252 0.0431
Word: snake 0.00229796 0.00163596 1.40465 0.1601
Word: spider 0.00227782 0.00163586 1.39243 0.1638
Word: squid 0.000583408 0.00162696 0.358587 0.7199
Word: squirrel -0.00137398 0.00161675 -0.84984 0.3954
Word: stork -0.00308618 0.00160787 -1.91942 0.0549
Word: strawberry 0.00209358 0.00163489 1.28056 0.2003
Word: swan 0.00133668 0.00163091 0.819593 0.4124
Word: tomato 0.00222563 0.00163558 1.36076 0.1736
Word: tortoise -0.00308983 0.00160785 -1.92172 0.0546
Word: vulture -0.00507203 0.00159763 -3.17472 0.0015
Word: walnut -7.66396e-5 0.00162351 -0.0472061 0.9623
Word: wasp -0.00229149 0.00161198 -1.42154 0.1552
Word: whale 0.00191123 0.00163393 1.16972 0.2421
Word: woodpecker -0.000608575 0.00162073 -0.375494 0.7073
julia> deviance(gm4)
1.785921568969029
so it is not really surprising that a GLMM with just a random intercept for Word ends up with a estimated standard deviation of zero.
julia> gmm1 = fit!(glmm(@formula(RT ~ 1 + NativeLanguage + (1|Word)), lexdec, Gamma()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: RT ~ 1 + NativeLanguage + (1 | Word)
Distribution: Distributions.Gamma{Float64}
Link: GLM.InverseLink()
Deviance (Laplace approximation): 2.0782
Variance components:
Column Variance Std.Dev.
Word (Intercept) 0 0
Number of obs: 1659; levels of grouping factors: 79
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 0.15827 0.00514038 30.7896 <1e-99
NativeLanguage: Other -0.00380929 0.00774463 -0.491862 0.6228
This should correspond exactly to the fixed-effects GLM but it doesn't. The standard errors of the coefficients are very different, which I think has to do with estimation of the dispersion parameter.
However, glmer produces very different results.
julia> R"summary(glmer(RT ~ 1 + NativeLanguage + (1|Word), data=languageR::lexdec, family=Gamma(), nAGQ=1))"
WARNING: RCall.jl: Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00106101 (tol = 0.001, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
RCall.RObject{RCall.VecSxp}
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: Gamma ( inverse )
Formula: RT ~ 1 + NativeLanguage + (1 | Word)
Data: languageR::lexdec
AIC BIC logLik deviance df.resid
-355.6 -333.9 181.8 -363.6 1655
Scaled residuals:
Min 1Q Median 3Q Max
-2.7398 -0.6618 -0.1312 0.5145 4.7924
Random effects:
Groups Name Variance Std.Dev.
Word (Intercept) 2.143e-06 0.001464
Residual 1.162e-03 0.034082
Number of obs: 1659, groups: Word, 79
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 0.1583137 0.0003481 454.9 <2e-16 ***
NativeLanguageOther -0.0038086 0.0002571 -14.8 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
NtvLnggOthr -0.325
convergence code: 0
Model failed to converge with max|grad| = 0.00106101 (tol = 0.001, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
I'm not sure what to make of this. Do you have any ideas, Ben?
I will try to dig into this tomorrow. I agree that an identity link seems weird (and potentially could make things harder, numerically); I would generally prefer the log link for Gamma as being the most numerically stable, but the authors of the linked paper do make some theoretical and practical arguments for an identity link.
I might try the allFit extension to see how it behaves across optimizers.
The inverse Gaussian family probably wouldn't be very hard to implement.
Doug, you said
I am going to work on the assumption that the RT column here is what you have as rt_raw
but the OP said
I've converted RTs to raw RTs
and the ?languageR::lexdec says
‘RT’ a numeric vector for logarithmically transformed reaction times.
so you might be working with the wrong response variable ...
Thanks for the quick response, Ben.
I have a vague recollection of discussion related to the dispersion parameter for the Gamma family and how its value influences the evaluation of the log-likelihood or deviance. Does this ring a bell at all?
Thank you very much for a quick reply!
Regarding raw_rt, I did convert lexdec RTs back to milliseconds, so all my examples are on raw RTs. The entire question discussed in the paper has to do with running the analysis on untransformed RTs instead of log or using identity=log. However, since RTs are not normally distributed, the authors recommend using Gamma or Inverse Gaussian with identity link (or inverse RTs, i.e. -1000/RT). This all has theoretical basis in assumptions about mental chronometry and the additive nature of different effects on the resulting response times measured in psycholinguistic experiments.
As I was running these comparisons in R and Julia. I've noticed that lme4, glmmTMB, and glmm in Julia give different results not only with glmm, but also with lmm. For instance,
fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
will converge just fine in Julia and in R with lme4, but not with glmmTMB. The latter will give a convergence error and NA for AIC, BIC, logLik, and deviance. However, fixed parameter estimates and variance components seem to be the same.
FWIW glmmTMB refuses to report a log-likelihood (or associated statistics) if the Hessian is not positive definite. You can still retrieve the negative log-likelihood with object$fit$objective ...
@hikea You write "Regarding raw_rt, I did convert lexdec RTs back to milliseconds, so all my examples are on raw RTs." I'm not sure what that means. Can you be more specific, perhaps giving a formula for rt_raw in terms of the RT column in the data set?
One thing about the RT response is that its range is comparatively small,
julia> extrema(lexdec[:RT])
(5.828946, 7.587311)
which means that there will not be much difference between the inverse link and the identity link. Also, if you look at a density plot for one of the combinations of levels of the fixed-effects parameters, e.g. Class .== "animal" and NativeLanguage .== "English" the density of the RT values doesn't look much like a Gamma to me.
Perhaps it would be better to consider another example.
If I fit the model you indicate to the RT response I get a singular fit.
julia> lmm1 = fit!(lmm(@formula(RT ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
Formula: RT ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
logLik -2 logLik AIC BIC
460.29075 -920.58151 -896.58151 -831.61386
Variance components:
Column Variance Std.Dev. Corr.
Word (Intercept) 0.00589016370 0.076747402
Subject (Intercept) 0.00791110551 0.088944396
Class: plant 0.00009214977 0.009599467 1.00
NativeLanguage: Other 0.00503482335 0.070956489 1.00 1.00
Residual 0.02970852476 0.172361610
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 6.31315 0.0291443 216.617 <1e-99
Class: plant 0.0116479 0.0209007 0.557299 0.5773
NativeLanguage: Other 0.174059 0.0602617 2.88839 0.0039
Class: plant & NativeLanguage: Other -0.0411659 0.0177272 -2.32219 0.0202
julia> getθ(lmm1)
7-element Array{Float64,1}:
0.44527
0.516034
0.0556938
0.411672
0.0
7.63226e-5
0.0
julia> getΛ(lmm1)[2]
3×3 Array{Float64,2}:
0.516034 0.0 0.0
0.0556938 0.0 0.0
0.411672 7.63226e-5 0.0
The 3 by 3 covariance matrix for the Subject random-effects term is of rank 1.
@bbolker As this model converges on the boundary the Hessian is not required to be positive definite. Do you know if glmmTMB takes into account that a constrained optimization problem can converge to a minimum without the Hessian being positive definite?
I did the conversion like so in R (don't know how to do it in Julia so far)
lexdec$rt_raw <- exp(lexdec$RT)
These are the density curves when I plot rt_raw, which is in milliseconds.
ggplot(lexdec, aes(x=rt_raw)) + geom_density() + facet_grid (NativeLanguage ~ Class, scale='free_x')

If you plot log transformed RTs (lexdec$RT), you get more "normal" picture, of curse.
One of my concerns above is that running with random by-subjects slopes for NativeLanguage should lead to non-convergence or a warning of some kind. glmmTMB did give convergence warnings on these models, but only with fits with Gaussian distribution, not Gamma. Julia on the other hand has never displayed any warnings for any model I ran so far.
@hikea Thanks for the explanation. I went back and looked at the documentation for languageR::lexdec and found that the RT column is logarithmic. Using lexdec[:rt_raw] = exp.(lexdec[:RT]) I get a model fit with a similar structure
julia> lmm2 = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
logLik -2 logLik AIC BIC
-1.04862493×10⁴ 2.09724985×10⁴ 2.09964985×10⁴ 2.10614662×10⁴
Variance components:
Column Variance Std.Dev. Corr.
Word (Intercept) 2842.995443 53.3197472
Subject (Intercept) 2502.046695 50.0204628
Class: plant 28.963076 5.3817354 1.00
NativeLanguage: Other 4201.453564 64.8186205 1.00 1.00
Residual 16133.661777 127.0183521
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.536 17.4262 32.3384 <1e-99
Class: plant 6.61877 14.7386 0.449079 0.6534
NativeLanguage: Other 119.1 41.7748 2.85101 0.0044
Class: plant & NativeLanguage: Other -28.9027 12.9058 -2.23951 0.0251
julia> getΛ(lmm2)[2]
3×3 Array{Float64,2}:
0.393805 0.0 0.0
0.0423697 0.0 0.0
0.510309 -2.42915e-5 0.0
But it doesn't make sense to include a random effect for NativeLanguage with respect to Subject because NativeLanguage doesn't vary within subject. Fitting the simpler model
julia> lmm3 = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class) | Subject) + (1 | Word)
logLik -2 logLik AIC BIC
-1.04893234×10⁴ 2.09786467×10⁴ 2.09966467×10⁴ 2.10453724×10⁴
Variance components:
Column Variance Std.Dev. Corr.
Word (Intercept) 2846.267870 53.350425
Subject (Intercept) 7091.953472 84.213737
Class: plant 30.075374 5.484102 1.00
Residual 16133.230828 127.016656
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.536 26.1962 21.5122 <1e-99
Class: plant 6.61877 14.7473 0.448812 0.6536
NativeLanguage: Other 119.1 38.0826 3.12742 0.0018
Class: plant & NativeLanguage: Other -28.9027 12.9141 -2.23808 0.0252
julia> getΛ(lmm3)[2]
2×2 Array{Float64,2}:
0.663013 0.0
0.0431762 0.0
still produces a singular fit. Notice that the log-likelihood has changed very little between lmm2 and lmm3.
Because of the singularity I would go further and reduce the model to
julia> lmm4 = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
logLik -2 logLik AIC BIC
-1.04896988×10⁴ 2.09793976×10⁴ 2.09933976×10⁴ 2.10312954×10⁴
Variance components:
Column Variance Std.Dev.
Word (Intercept) 2845.8845 53.346832
Subject (Intercept) 7506.9142 86.642450
Residual 16141.2087 127.048056
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.536 26.8482 20.9897 <1e-97
Class: plant 6.61877 14.6626 0.451405 0.6517
NativeLanguage: Other 119.1 39.1281 3.04386 0.0023
Class: plant & NativeLanguage: Other -28.9027 12.6888 -2.27782 0.0227
As a GLMM with the inverse link it is
julia> glmm1 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec, Gamma()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
Distribution: Distributions.Gamma{Float64}
Link: GLM.InverseLink()
Deviance (Laplace approximation): 90.5431
Variance components:
Column Variance Std.Dev.
Word (Intercept) 0.000000000000000 0.00000000000
Subject (Intercept) 0.000000015758068 0.00012553114
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 0.0017758 8.52957e-5 20.8194 <1e-95
Class: plant -2.05845e-5 0.000115231 -0.178637 0.8582
NativeLanguage: Other -0.000304747 0.000120173 -2.5359 0.0112
Class: plant & NativeLanguage: Other 6.98092e-5 0.000161009 0.433574 0.6646
which must be reduced to
julia> glmm2 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, Gamma()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
Distribution: Distributions.Gamma{Float64}
Link: GLM.InverseLink()
Deviance (Laplace approximation): 90.5248
Variance components:
Column Variance Std.Dev.
Subject (Intercept) 0.000000015872202 0.00012598493
Number of obs: 1659; levels of grouping factors: 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 0.00178218 8.55298e-5 20.8369 <1e-95
Class: plant -2.05235e-5 0.000115529 -0.177647 0.8590
NativeLanguage: Other -0.000304179 0.000120528 -2.52372 0.0116
Class: plant & NativeLanguage: Other 6.95288e-5 0.000161447 0.430659 0.6667
julia> glmm2.LMM.optsum
Initial parameter vector: [0.0017758, -2.05845e-5, -0.000304747, 6.98092e-5, 0.000125531]
Initial objective value: 90.54308478875748
Optimizer (from NLopt): LN_BOBYQA
Lower bounds: [-Inf, -Inf, -Inf, -Inf, 0.0]
ftol_rel: 1.0e-12
ftol_abs: 1.0e-8
xtol_rel: 0.0
xtol_abs: [1.0e-10]
initial_step: [2.84319e-5, 3.84104e-5, 4.00576e-5, 5.36696e-5, 3.13828e-5]
maxfeval: -1
Function evaluations: 77
Final parameter vector: [0.00178218, -2.05235e-5, -0.000304179, 6.95288e-5, 0.000125985]
Final objective value: 90.52478628806914
Return code: FTOL_REACHED
If you really want the identity link it would look like
julia> glmm3 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Word) + (1|Subject)), lexdec, Gamma(), GLM.IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Word) + (1 | Subject)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 91.9169
Variance components:
Column Variance Std.Dev.
Word (Intercept) 0.0000 0.000000
Subject (Intercept) 1492.1437 38.628276
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 561.603 26.8466 20.9189 <1e-96
Class: plant 6.47181 36.919 0.175298 0.8608
NativeLanguage: Other 114.858 45.1489 2.54399 0.0110
Class: plant & NativeLanguage: Other -29.445 62.1656 -0.473654 0.6357
julia> glmm4 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, Gamma(), GLM.IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 91.9053
Variance components:
Column Variance Std.Dev.
Subject (Intercept) 1495.1625 38.66733
Number of obs: 1659; levels of grouping factors: 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.544 26.9076 20.9437 <1e-96
Class: plant 6.45074 37.0112 0.174292 0.8616
NativeLanguage: Other 114.57 45.2377 2.53262 0.0113
Class: plant & NativeLanguage: Other -29.3727 62.3007 -0.471467 0.6373
or, eliminating the insignificant fixed-effects terms in Class
julia> glmm5 = fit!(glmm(@formula(rt_raw ~ 1 + NativeLanguage + (1|Subject)), lexdec, Gamma(), GLM.IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + NativeLanguage + (1 | Subject)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 92.1422
Variance components:
Column Variance Std.Dev.
Subject (Intercept) 1482.8712 38.508066
Number of obs: 1659; levels of grouping factors: 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 566.399 21.4695 26.3815 <1e-99
NativeLanguage: Other 101.59 35.368 2.87238 0.0041
Note that the deviance as reported here with the inverse link is slightly better (i.e. smaller) than the corresponding fits with the identity link. I say as reported here because I am not entirely sure that the calculation of the deviance for a Gamma GLMM is correct. I am also unsure about the standard errors on the fixed-effects coefficients in this case.
@hikea Regarding the failure to produce a warning for models including random effects for NativeLanguage by Subject, the fitted model is obviously singular and I presume that the user is going to check for singularity. Trying to guess in what way a user may specify a nonsensical model and check if they have done so becomes a game of Whac-A-Mole and I don't think it is a good use of my time. Pull requests would be welcome, of course.
Thank you! It looks like I get a proper fit for the Gamma model with Identitylink when I don't use fast=true
julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 91.9053
Variance components:
Column Variance Std.Dev.
Word (Intercept) 0.000016739485 0.0040913915
Subject (Intercept) 1496.106669109329 38.6795381191
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.546 26.909 20.9426 <1e-96
Class: plant 6.45169 37.0112 0.174317 0.8616
NativeLanguage: Other 114.567 45.2397 2.53245 0.0113
Class: plant & NativeLanguage: Other -29.3727 62.3006 -0.471468 0.6373
julia> getΛ(lexdec_glmm_gamma)[2]
1×1 Array{Float64,2}:
38.6795
@hikea Regrettably that fit fails in the development version of MixedModels. I need to patch up some starting estimates
julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec, Gamma(), GLM.IdentityLink()))
ERROR: ArgumentError: invalid NLopt arguments
Stacktrace:
[1] chk(::Int32) at /home/bates/.julia/v0.6/NLopt/src/NLopt.jl:259
[2] chkn at /home/bates/.julia/v0.6/NLopt/src/NLopt.jl:277 [inlined]
[3] initial_step!(::NLopt.Opt, ::Array{Float64,1}) at /home/bates/.julia/v0.6/NLopt/src/NLopt.jl:365
[4] NLopt.Opt(::MixedModels.OptSummary{Float64}) at /home/bates/.julia/v0.6/MixedModels/src/types.jl:94
[5] #fit!#68(::Bool, ::Bool, ::Function, ::MixedModels.GeneralizedLinearMixedModel{Float64}) at /home/bates/.julia/v0.6/MixedModels/src/PIRLS.jl:213
[6] fit!(::MixedModels.GeneralizedLinearMixedModel{Float64}) at /home/bates/.julia/v0.6/MixedModels/src/PIRLS.jl:193
What happens is that the fast = true optimization is always performed first to get starting estimates for the more general fit. I will need to modify the starting estimates for the more general fit when one of the constrained θ parameters converges to zero in the fast = true case. See https://github.com/dmbates/MixedModels.jl/issues/95
In the fit you show it looks as if the nonzero standard deviation for the Word random effect is essentially round-off error
Not a problem. I've noticed some weird behaviour when using fast=true. Is it similar to calc.derivs = FALSE in lme4? When I use that, I can fit any model, fast, and without warnings.
The standard deviation for the Word random effect is indeed close to zero, but the fit is non-singular.
The fast=true version of the fit uses Penalized Iteratively Reweighted Least Squares (PIRLS) to determine the conditional mode of the random effects and the conditional estimate of the fixed-effects parameter, given a value of θ. It is similar to using nAGQ=0L in a glmer fit.
Generally the difference between the fast=true fit and the optimization of all the parameters using PIRLS only to determine the conditional modes of the random effects is very small.
When you say that you have noticed some weird behaviour, do you have examples you can share?
Unfortunately, I don't have anything to share at the moment, but I will when I encounter something again.
Thank you for all the explanations. I hope this post is useful not only to me but also to everyone who is new to GLMM and Julia.
Could you please address my questions 2 and 4? 2. Are interactions of random effects supported? 4. Are there any plans to implement support for Inverse Gaussian? (@bbolker mentioned that this is a possibility)
To address my third question, using my data (20k observations, 3x3x2 design), I dropped a by-word random slope (the maximal model for my data requires one) and used fast=true. This decreased the run time from 304 seconds to 117 seconds. With by-item slope and fast=true I get 406 seconds.
I have openblas-lapack installed, but I only allow 2 threads for openblas via export OPENBLAS_NUM_THREADS=2 as I am using a laptop and want to prevent overheating issues.
I'm falling behind a bit.
- with regard to interactions of random effects in Julia (Q2): I don't know how the formulas are evaluated, but at the very worst you can construct your own interaction term and add it to the data frame. The R analogue would be
lexdat$CNLint <- with(lexdat, interaction(Class,NativeLanguage))
then fit the model with a (1+CNLint|grpvar) term in it.
- with regard to an inverse-Gamma family (Q4): you (or someone) would need to go to the Julia GLM package and implement all the bits of the inverse-Gamma family (variance function etc.) that are needed. Shouldn't be too hard to port from R's
inverse.gammafunction ... you could post an issue there if you were hoping that someone else would do it ... - with regard to singular fits, variations among platforms:
- I agree with Doug that trying to detect bad model specifications is often whac-a-mole-ish, although lme4's pre-convergence checks (checking numbers of observations vs. rank of Z, number of levels, number of random effects) are reasonably sensible. It might not be impossible to come up with some automated ways to test for the required minimal replication levels, but it's not high on my list ...
glmmTMBparameterizes variance-covariance matrices via decomposing them into a correlation matrix (further parameterized by lower-triangle elements, see here whose parameters are (I think) naturally unconstrained) and log-standard deviations. It should be easy to guess that singularity occurring when log-sd gets sufficiently negative, but I'm not sure whether there are any other conditions/limits on the parameters (it's probably necessary but ? not sufficient for some set of the theta values to get large?)
@bbolker thank you! This is very helpful. I've never done any implementations, but I'll take a look.
Looking a bit more at this (not necessarily in the right direction), I've noticed that correlations of random effects do not match.
julia> lexdec_model_rt_raw = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
logLik -2 logLik AIC BIC
-1.04862493×10⁴ 2.09724986×10⁴ 2.09964986×10⁴ 2.10614662×10⁴
Variance components:
Column Variance Std.Dev. Corr.
Word (Intercept) 2842.95699 53.319387
Subject (Intercept) 2502.20502 50.022045
Class: plant 28.94764 5.380301 1.00
NativeLanguage: Other 4200.34712 64.810085 1.00 1.00
Residual 16133.67862 127.018418
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.536 17.4266 32.3377 <1e-99
Class: plant 6.61877 14.7385 0.449082 0.6534
NativeLanguage: Other 119.1 41.7728 2.85115 0.0044
Class: plant & NativeLanguage: Other -28.9027 12.9057 -2.23952 0.0251
julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 88.5160
Variance components:
Column Variance Std.Dev. Corr.
Word (Intercept) 0.000013282934 0.0036445759
Subject (Intercept) 0.000000000000 0.0000000000
Class: plant 2.154574504719 1.4678468942 NaN
NativeLanguage: Other 6919.833771948175 83.1855382380 NaN 0.99
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.54 24.5249 22.9783 <1e-99
Class: plant 6.61341 37.0896 0.178309 0.8585
NativeLanguage: Other 114.65 50.0573 2.29038 0.0220
Class: plant & NativeLanguage: Other -30.0731 61.753 -0.486989 0.6263
> summary( lmer(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec, REML=F) )
Linear mixed model fit by maximum likelihood
t-tests use Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage | Subject) + (1 | Word)
Data: lexdec
AIC BIC logLik deviance df.resid
20996.5 21061.5 -10486.2 20972.5 1647
Scaled residuals:
Min 1Q Median 3Q Max
-2.1722 -0.5469 -0.1480 0.3082 8.1841
Random effects:
Groups Name Variance Std.Dev. Corr
Word (Intercept) 2842.96 53.319
Subject (Intercept) 2502.17 50.022
Classplant 28.96 5.381 1.00
NativeLanguageOther 4201.33 64.818 1.00 1.00
Residual 16133.66 127.018
Number of obs: 1659, groups: Word, 79; Subject, 21
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 563.536 17.427 19.500 32.338 <2e-16 ***
Classplant 6.619 14.738 103.630 0.449 0.6543
NativeLanguageOther 119.100 41.775 11.900 2.851 0.0147 *
Classplant:NativeLanguageOther -28.903 12.906 259.890 -2.240 0.0260 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Clsspl NtvLnO
Classplant -0.283
NtvLnggOthr -0.328 0.013
Clsspln:NLO 0.036 -0.375 0.037
> summary( glmer(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec, family=Gamma(link="identity")) )
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: Gamma ( identity )
Formula: rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage | Subject) + (1 | Word)
Data: lexdec
AIC BIC logLik deviance df.resid
20343.6 20408.6 -10159.8 20319.6 1647
Scaled residuals:
Min 1Q Median 3Q Max
-1.7847 -0.5918 -0.1717 0.3281 8.2704
Random effects:
Groups Name Variance Std.Dev. Corr
Word (Intercept) 1.529e+03 39.1034
Subject (Intercept) 8.950e+02 29.9174
Classplant 1.861e+01 4.3135 0.45
NativeLanguageOther 2.176e+03 46.6512 -0.15 0.81
Residual 4.127e-02 0.2032
Number of obs: 1659, groups: Word, 79; Subject, 21
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 580.836 11.858 48.98 < 2e-16 ***
Classplant 6.813 10.282 0.66 0.50759
NativeLanguageOther 124.780 10.898 11.45 < 2e-16 ***
Classplant:NativeLanguageOther -24.912 7.757 -3.21 0.00132 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Clsspl NtvLnO
Classplant 0.041
NtvLnggOthr 0.015 -0.037
Clsspln:NLO -0.059 -0.140 0.099
> summary( glmmTMB(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec) )
Family: gaussian ( identity )
Formula: rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage | Subject) + (1 | Word)
Data: lexdec
AIC BIC logLik deviance df.resid
NA NA NA NA 1647
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
Word (Intercept) 2.846e+03 5.335e+01
Subject (Intercept) 7.507e+03 8.664e+01
Classplant 5.503e-54 2.346e-27 -1.00
NativeLanguageOther 1.545e-17 3.931e-09 -0.97 0.97
Residual 1.614e+04 1.270e+02
Number of obs: 1659, groups: Word, 79; Subject, 21
Dispersion estimate for gaussian family (sigma^2): 1.61e+04
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 563.536 26.848 20.990 < 2e-16 ***
Classplant 6.619 14.663 0.451 0.65169
NativeLanguageOther 119.101 39.128 3.044 0.00234 **
Classplant:NativeLanguageOther -28.903 12.689 -2.278 0.02274 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning messages:
1: In nlminb(start = par, objective = fn, gradient = gr) :
NA/NaN function evaluation
2: In nlminb(start = par, objective = fn, gradient = gr) :
NA/NaN function evaluation
3: In glmmTMB(rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage | :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
> summary( glmmTMB(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec, family=list(family="Gamma",link="identity")) )
Family: Gamma ( identity )
Formula: rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage | Subject) + (1 | Word)
Data: lexdec
AIC BIC logLik deviance df.resid
NA NA NA NA 1647
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
Word (Intercept) 2.123e+03 4.608e+01
Subject (Intercept) 7.615e+04 2.760e+02
Classplant 6.378e-119 7.986e-60 -1.00
NativeLanguageOther 4.414e+03 6.644e+01 1.00 -0.99
Number of obs: 1659, groups: Word, 79; Subject, 21
Dispersion estimate for Gamma family (sigma^2): 0.0327
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 266.34 NA NA NA
Classplant 31.71 12.63 2.511 0.012 *
NativeLanguageOther 125.93 NA NA NA
Classplant:NativeLanguageOther -56.56 10.97 -5.154 2.55e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
There were 47 warnings (use warnings() to see them)
All correlations look pretty high, but there are discrepancies both in direction and magnitude.
@hikea You have almost convinced me that I should print warning messages about convergence to singular covariance matrices. All of GLMM fits are to singular covariance matrices for the Subject factor.
julia> using MixedModels, RCall
julia> lexdec = rcopy(R"languageR::lexdec");
julia> lexdec[:rt_raw] = exp.(lexdec[:RT]);
julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 88.5302
Variance components:
Column Variance Std.Dev. Corr.
Word (Intercept) 0.000008148119 0.0028544911
Subject (Intercept) 0.000000000000 0.0000000000
Class: plant 2.141102817949 1.4632507707 NaN
NativeLanguage: Other 6917.959907320534 83.1742743120 NaN 0.99
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.536 24.5248 22.9783 <1e-99
Class: plant 6.61613 37.0894 0.178383 0.8584
NativeLanguage: Other 110.297 49.9933 2.20624 0.0274
Class: plant & NativeLanguage: Other -30.3897 61.6327 -0.493078 0.6220
julia> getΛ(lexdec_glmm_gamma)[2]
3×3 Array{Float64,2}:
0.0 0.0 0.0
1.32808 0.614256 0.0
74.3338 34.6921 13.7442
Also, it is nonsensical to include random effects for NativeLanguage by Subject because NativeLanguage is a property of Subject.
You can't expect consistency in fits to nonsense models.
I am glad that I am being of help here as well (although, I thought it was the opposite). I know that these are nonsensical models with wrong parameters, but I thought I would get consistent results nevertheless. Thank you for implementing Inverse Gaussian in GLM --> https://github.com/JuliaStats/GLM.jl/issues/193
Thank you @dmbates I've just noticed that Inverse Gaussian implementation was merged and it works in MixedModels (after using Pgk.checkout("GLM") and Pkg.chekout("MixedModels") ).
However, I get the same error you were getting earlier
julia> lexdec_inv_gauss = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, InverseGaussian(), IdentityLink()))
ERROR: ArgumentError: invalid NLopt arguments
but not with Gamma
julia> lexdec_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, Gamma(), IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
Distribution: Distributions.Gamma{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 91.9053
Variance components:
Column Variance Std.Dev.
Subject (Intercept) 1495.1625 38.66733
Number of obs: 1659; levels of grouping factors: 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.544 26.9076 20.9437 <1e-96
Class: plant 6.45074 37.0112 0.174292 0.8616
NativeLanguage: Other 114.57 45.2377 2.53262 0.0113
Class: plant & NativeLanguage: Other -29.3727 62.3007 -0.471467 0.6373
however, Gamma will give the same error if I include random effects for words (1|Word), which has a 0 variance in Julia (see below for R output)
julia> lexdec_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)+(1|Word)), lexdec, Gamma(), IdentityLink()))
ERROR: ArgumentError: invalid NLopt arguments
using fast=true helps with this issue
julia> lexdec_inv_gauss = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, InverseGaussian(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
Distribution: Distributions.InverseGaussian{Float64}
Link: GLM.IdentityLink()
Deviance (Laplace approximation): 0.1438
Variance components:
Column Variance Std.Dev.
Subject (Intercept) 0 0
Number of obs: 1659; levels of grouping factors: 21
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 563.536 582.191 0.967957 0.3331
Class: plant 6.61877 883.313 0.00749313 0.9940
NativeLanguage: Other 119.1 1068.76 0.111438 0.9113
Class: plant & NativeLanguage: Other -28.9027 1580.41 -0.0182882 0.9854
We can see that the model with InverseGaussian returns 0 variance for (1|Subject) as well. I guess that's why it didn't converge without fast=true, just like Gamma
The fixed effects estimates are close to R output with nAGQ=0L, but std.errors, random effects and deviance are completely different. You can see that neither Word nor Subject has 0 variance (the same is true for Gamma model), although the residual is zero-ish.
> lexdec_inv_gauss_ <- glmer(rt_raw ~ 1+ Class*NativeLanguage + (1|Subject) + (1|Word), data=lexdec, family=inverse.gaussian(link="identity"), nAGQ=0)
> summary(lexdec_inv_gauss_)
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
Family: inverse.gaussian ( identity )
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
Data: lexdec
AIC BIC logLik deviance df.resid
20185.1 20223.0 -10085.6 20171.1 1652
Scaled residuals:
Min 1Q Median 3Q Max
-1.7972 -0.5759 -0.1617 0.3632 8.2905
Random effects:
Groups Name Variance Std.Dev.
Word (Intercept) 1.400e+03 37.412913
Subject (Intercept) 1.909e+03 43.692319
Residual 6.799e-05 0.008245
Number of obs: 1659, groups: Word, 79; Subject, 21
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 561.088 14.604 38.42 < 2e-16 ***
Classplant 6.484 11.026 0.59 0.5565
NativeLanguageOther 109.614 21.039 5.21 1.89e-07 ***
Classplant:NativeLanguageOther -29.105 12.064 -2.41 0.0158 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Clsspl NtvLnO
Classplant -0.332
NtvLnggOthr -0.590 0.092
Clsspln:NLO 0.121 -0.370 -0.261
Updating the last example for the current release, we still have a problem for the general optimization, and then the general optimization fails (see #95).
julia> using RCall, MixedModels, GLM
julia> lexdec = rcopy(R"languageR::lexdec");
julia> lexdec[:rt_raw] = exp.(lexdec[:RT]);
julia> f = @formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word));
julia> lexdec_glmm_gamma = fit!(GeneralizedLinearMixedModel(f, lexdec, Gamma(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
rt_raw ~ 1 + Class + NativeLanguage + Class & NativeLanguage + (1 | Subject) + (1 | Word)
Distribution: Gamma{Float64}
Link: IdentityLink()
Deviance: 91.9169
Variance components:
Column Variance Std.Dev.
Word (Intercept) 0.0000 0.000000
Subject (Intercept) 1492.1437 38.628276
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────
Estimate Std.Error z value P(>|z|)
──────────────────────────────────────────────────────────────────────────────
(Intercept) 561.603 6.69334 83.9047 <1e-99
Class: plant 6.47181 9.20457 0.703109 0.4820
NativeLanguage: Other 114.858 11.2564 10.2038 <1e-23
Class: plant & NativeLanguage: Other -29.445 15.499 -1.8998 0.0575
──────────────────────────────────────────────────────────────────────────────
julia> fit!(lexdec_glmm_gamma, fast=false)
ERROR: ArgumentError: invalid NLopt arguments: zero step size
. . .
julia> lexdec_glmm_invgauss = fit!(GeneralizedLinearMixedModel(f, lexdec, InverseGaussian(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
rt_raw ~ 1 + Class + NativeLanguage + Class & NativeLanguage + (1 | Subject) + (1 | Word)
Distribution: InverseGaussian{Float64}
Link: IdentityLink()
Deviance: 0.1438
Variance components:
Column Variance Std.Dev.
Word (Intercept) 0 0
Subject (Intercept) 0 0
Number of obs: 1659; levels of grouping factors: 79, 21
Fixed-effects parameters:
─────────────────────────────────────────────────────────────────────────────
Estimate Std.Error z value P(>|z|)
─────────────────────────────────────────────────────────────────────────────
(Intercept) 563.536 6.03062 93.4458 <1e-99
Class: plant 6.61877 9.14979 0.72338 0.4694
NativeLanguage: Other 119.1 11.0707 10.7582 <1e-26
Class: plant & NativeLanguage: Other -28.9027 16.3706 -1.76552 0.0775
─────────────────────────────────────────────────────────────────────────────
julia> fit!(lexdec_glmm_invgauss, fast=false)
ERROR: ArgumentError: invalid NLopt arguments: zero step size
. . .