MixedModels.jl Take the dispersion into account for GLMM familes with a dispersion param

Closes #206

[x] varest
[ ] ~deviance~ (this is handled implictly)
[x] loglikelihood
[x] export all the usual links
- [x] identity
- [x] sqrt
- [x] log
- [x] inverse
- [x] probit
- [x] logit
[ ] tests for GLMM functionality in all families
- [x] Bernoulli (contra)
  - [x] deviance
  - [x] loglikelihood
- [ ] Binomial (cbpp, verbagg)
  - [x] deviance
  - [ ] loglikelihood
- [ ] Gamma
  - [ ] deviance
  - [ ] loglikelihood
- [x] Gaussian with non-identity link
  - [x] deviance
  - [ ] loglikelihood
- [ ] InverseGaussian
  - [ ] deviance
  - [ ] loglikelihood
- [ ] Poisson (grouseticks)
  - [x] deviance
  - [ ] loglikelihood

Note (to self) that some distributions (e.g. Gamma) require changing the linear predictor + dispersion into a different parameterization (e.g. shape, scale).

Feb 26 '20 19:02 palday

Codecov Report

Patch coverage has no change and project coverage change: -3.10% :warning:

Comparison is base (df203bc) 95.83% compared to head (d6db3b0) 92.74%.

:exclamation: Current head d6db3b0 differs from pull request most recent head 0af4eeb. Consider uploading reports for the commit 0af4eeb to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #291      +/-   ##
==========================================
- Coverage   95.83%   92.74%   -3.10%     
==========================================
  Files          35       23      -12     
  Lines        3267     1722    -1545     
==========================================
- Hits         3131     1597    -1534     
+ Misses        136      125      -11

Flag	Coverage Δ
current	`?`
minimum	`?`
nightly	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 36 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Feb 26 '20 19:02 codecov[bot]

For tinkering and comparison to lme4, look at this gist.

Feb 26 '20 23:02 palday

After pulling my hair out for far too long trying to figure out why fast fits worked but 'slow' ones didn't, I tried a different optimizer. It turns out that I accidentally stumbled across a case where BOBYQA fails dramatically. The old two-stage optimization for slow fits didn't have this problem because it started from the fast fits don't.

julia> ┌ Warning: Model has not been fit
└ @ MixedModels ~/Work/MixedModels.jl/src/generalizedlinearmixedmodel.jl:563
julia> bobyqa = GeneralizedLinearMixedModel(@formula(yield ~ 1 + (1|batch)), MixedModels.dataset(:dyestuff),Normal(),SqrtLink());

julia> fit!(bobyqa)
┌ Warning: Model has not been fit
└ @ MixedModels ~/Work/MixedModels.jl/src/generalizedlinearmixedmodel.jl:563
┌ Warning: Model has not been fit
└ @ MixedModels ~/Work/MixedModels.jl/src/generalizedlinearmixedmodel.jl:563
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  yield ~ 1 + (1 | batch)
  Distribution: Normal{Float64}
  Link: SqrtLink()

  Deviance: 58893.7814

Variance components:
            Column    Variance Std.Dev. 
batch    (Intercept)  1954.5903 44.21075
Residual              2178.8889 46.67857
 Number of obs: 30; levels of grouping factors: 6

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   39.0793    18.0493     2.17   0.0304
──────────────────────────────────────────────────

julia> bobyqa.optsum
Initial parameter vector: [39.08324449172631, 1.0]
Initial objective value:  58893.79526421842

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [-Inf, 0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10]
initial_step:             [39.08324449172631, 0.75]
maxfeval:                 -1

Function evaluations:     26
Final parameter vector:   [39.0788080375932, 0.998348758382195]
Final objective value:    58893.7814171595
Return code:              FTOL_REACHED


julia> neldermead = GeneralizedLinearMixedModel(@formula(yield ~ 1 + (1|batch)), MixedModels.dataset(:dyestuff),Normal(),SqrtLink());

julia> ┌ Warning: Model has not been fit
└ @ MixedModels ~/Work/MixedModels.jl/src/generalizedlinearmixedmodel.jl:563
┌ Warning: Model has not been fit
└ @ MixedModels ~/Work/MixedModels.jl/src/generalizedlinearmixedmodel.jl:563
julia> neldermead.optsum.optimizer = :LN_NELDERMEAD;

julia> fit!(neldermead)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  yield ~ 1 + (1 | batch)
  Distribution: Normal{Float64}
  Link: SqrtLink()

  Deviance: 58890.8511

Variance components:
            Column   Variance  Std.Dev. 
batch    (Intercept)   599.565 24.486016
Residual              2178.889 46.678570
 Number of obs: 30; levels of grouping factors: 6

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   39.0793    9.99691     3.91    <1e-4
──────────────────────────────────────────────────

julia> neldermead.optsum
Initial parameter vector: [39.08324449172631, 1.0]
Initial objective value:  58893.79526421842

Optimizer (from NLopt):   LN_NELDERMEAD
Lower bounds:             [-Inf, 0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10]
initial_step:             [39.08324449172631, 0.75]
maxfeval:                 -1

Function evaluations:     80
Final parameter vector:   [39.07930704835212, 0.5529134950119795]
Final objective value:    58890.851102667264
Return code:              FTOL_REACHED

For reference, here's the slow fit:

julia> fastfit = fit(MixedModel, @formula(yield ~ 1 + (1|batch)), MixedModels.dataset(:dyestuff),Normal(),SqrtLink(), fast=true)

Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  yield ~ 1 + (1 | batch)
  Distribution: Normal{Float64}
  Link: SqrtLink()

  Deviance: 58890.8511

Variance components:
            Column    Variance  Std.Dev. 
batch    (Intercept)   599.6439 24.487628
Residual              2178.8889 46.678570
 Number of obs: 30; levels of grouping factors: 6

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   39.0793    9.99757     3.91    <1e-4
──────────────────────────────────────────────────

julia> fastfit.optsum
Initial parameter vector: [1.0]
Initial objective value:  58893.79517237531

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10]
initial_step:             [0.75]
maxfeval:                 -1

Function evaluations:     26
Final parameter vector:   [0.55294989485893]
Final objective value:    58890.85110260193
Return code:              FTOL_REACHED

Apr 13 '20 23:04 palday

Based on tinkering elsewhere (e.g. toying with fitting a Gamma model with identity and log links to the languageR::lexdec data -- there was a proposal to use Gamma with identity link to model RT data), this type of problem with the optimizer seems to keep cropping up for GLMMs with dispersion parameter. I suspect that the Gamma model I was trying is already on very shaky in terms of numerical stability, but without better test data, I can't really check this.

I don't know if the optimizer issues are particular to the datasets I've been testing on (which aren't ideal for the models I've been fitting) or indicative or a larger issue in optimization for GLMMs with dispersion parameters.

But given all that, I think it's ready for an initial review.

Apr 14 '20 00:04 palday

Do you want to continue to use StableRNG now that the Random package for v1.7.0 has switched to Xoshiro as the default RNG?

Jan 04 '22 16:01 dmbates

For tests, yes, because there could in theory be further optimizations to the MersenneTwister (or more specifically, methods that interpret the stream produced by MT).

On the PRNG front: I'm still looking at how we can take advantage of Xoshiro and related improvements for our embarrassingly parallel things.

Jan 04 '22 16:01 palday