MixedModel Equivalent to GLM's (mod.mm.assign)
How can I derive an equivalent output of a MixedModel like GLM's model.mm.assign property?
I'm having a hard time finding it in the documentation and source code. I apologize if the solution is rather simple.
Thank you.
What are you trying to accomplish? There is nothing in MixedModels.jl that directly corresponds to StatsModels.ModelMatrix.assign. Depending on what you're trying to accomplish, I can potentially provide better guidance.
I'm creating a function to calculate Variance Inflation Factors for the GLM package(https://github.com/JuliaStats/GLM.jl/issues/428). The code I am creating with can also work with Mixed Models. However, to properly assess factored variables, I need something similar to the model.mm.assign that GLM provides for Mixed Models.
Thank you for your time and help.
There is already a VIF implementation for mixed models: https://palday.github.io/MixedModelsExtras.jl/stable/api/#MixedModelsExtras.vif.
The implementation there would also work for GLM.jl models with an appropriate method for termnames. Indeed the type restriction for both vif and gvif is written as RegressionModel, which GLM.jl models are a subtype of. Realistically termnames needs to be upstreamed to StatsModels.jl as
function termnames(f::FormulaTerm)
# copy body from MixedModelsExtras.jl
end
termnames(model::RegressionModel) = termnames(formula(model))
and vif and gvif should be stubbed (and thus owned by) StatsAPI.jl. Before StatsAPI was split off from StatsBase, I had started working on this (https://github.com/JuliaStats/StatsBase.jl/pull/698/files), but we decided that the best option was to try it out in a end-user package and work out the API making the API part of StatsBase/StatsAPI. It's a lot easier to have a breaking change+release in MixedModelsExtras than in StatsBase/StatsAPI because there's less of an impact on downstream packages.
cc: @kleinschmidt and @nalimilan
Oops, sorry for not realizing that we had discussed this before. For some reason GitHub didn't add a link to https://github.com/JuliaStats/StatsBase.jl/pull/698 at https://github.com/JuliaStats/GLM.jl/issues/428, despite the mention.
@HiramTheHero Is the implementation in MixedModelsExtras similar to what you wanted to do based on the car R package?
Anyway it seems we should add empty function definitions to StatsAPI, and termnames definitions in StatsModels. Then vif and gvif can either be defined separately in GLM.jl and in MixedModelsExtras.jl, or via a single common definition in StatsModels.jl (but @kleinschmidt is hesitant about the latter solution so we could choose the former, at least as a first step).
@HiramTheHero Is the implementation in MixedModelsExtras similar to what you wanted to do based on the car R package?
The vif function I created creates the same results as the car R package. The gvif function in MixedModelsExtras uses the same algorithm created by John Fox. (Which is what I used in my function.) The vif function in MixedModelsExtras uses a different/simpler algorithm that doesn't take into account non-factored variables correctly. (Which is what the gvif function is for to my understanding.)
In the example below, my function is just the vif function. Here is a comparison between @palday's work and mine. In my opinion, them seem similar enough.
julia> MOD = lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth), iri
s)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.Dens
ePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Mat
rix{Float64}}
SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth
Coefficients:
─────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept) 1.856 0.250777 7.40 <1e-11 1.36038 2.35162
SepalWidth 0.650837 0.0666474 9.77 <1e-16 0.519119 0.782555
PetalLength 0.709132 0.0567193 12.50 <1e-24 0.597035 0.821229
PetalWidth -0.556483 0.127548 -4.36 <1e-04 -0.808561 -0.304404
─────────────────────────────────────────────────────────────────────────
julia> MOD2 = lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Sp
ecies), iris)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.Dens
ePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Mat
rix{Float64}}
SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species
Coefficients:
──────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept) 2.17127 0.279794 7.76 <1e-11 1.61823 2.7243
SepalWidth 0.495889 0.0860699 5.76 <1e-07 0.325765 0.666013
PetalLength 0.829244 0.0685276 12.10 <1e-22 0.693794 0.964694
PetalWidth -0.315155 0.151196 -2.08 0.0389 -0.614005 -0.0163054
Species: versicolor -0.723562 0.240169 -3.01 0.0031 -1.19827 -0.24885
Species: virginica -1.0235 0.333726 -3.07 0.0026 -1.68313 -0.363863
──────────────────────────────────────────────────────────────────────────────────
julia> vif(MOD).vif
3-element Vector{Float64}:
1.2708148956298828
15.097572326660156
14.234334945678711
julia> MixedModelsExtras.vif(MOD)
3-element Vector{Float64}:
1.2708149293446902
15.097572322916154
14.234334971742324
julia> vif(MOD2).gvif
4-element Vector{Float32}:
2.2274659
23.161648
21.0214
40.039177
julia> MixedModelsExtras.gvif(MOD2, scale = false)
4-element Vector{Float64}:
2.2274657532417894
23.16164777422666
21.02140052819936
40.039177290459335
julia> vif(MOD2).gvifModified
4-element Vector{Float32}:
1.4924697
4.812655
4.58491
2.5154824
julia> MixedModelsExtras.gvif(MOD2, scale = true)
4-element Vector{Float64}:
1.4924696825201473
4.812654961061167
4.584910089434619
2.515482418758807
Just to input my 2 cents on the matter. Whatever is used going forward, I would suggest either implementing one function that calculates a VIF or GVIF based on the if a model contains factored variables or not, or have explicit warnings to use gvif if the vif function is used on a model with factored variables.
OK, great. Maybe we could have vif return the GVIF by default unless a gvif=false argument is passed. What do you think @palday?
I don't like the approach of printing warnings when vif if called on a model with categorical predictors. Warnings are annoying and don't make a lot of sense: either the result is legit and no warning should be printed, or it's not and we shouldn't return it at all.
@nalimilan I have a slight preference to have gvif and vif as separate functions, much like r2 and adjr2 are. We don't have to issue the warning for categorical models with vif -- there are cases (e.g. non orthogonal contrasts) where it's interesting to see the VIF for individual coefficients within a single categorical term. For GLM.jl, there's also the question of how gvif would work for models fit with the matrix interface (my preference: error).
As currently implemented, gvif depends on termnames, which probably needs to live in / be owned by StatsModels.
I think the MixedModels portion of this is complete and the necessary stubs have been upstreamed to StatsAPI/StatsModels. Anything else is a GLM.jl issue, so I'm closing this. :smile: