insight `get_predicted`: Satterthwaite and Kenward df

`get_predicted`: Satterthwaite and Kenward df

Open vincentarelbundock opened this issue 2 years ago • 14 comments

Is there an easy way to repurpose some of the code in parameters to allow these in insight::get_predicted or should I start from scratch?

Git Blame says that @strengejacke wrote a lot of the code there, so he may have ideas for and preferences over implementation

Jan 28 '22 20:01 vincentarelbundock

For reference, I believe that this is basically how emmeans gets them (perhaps more efficiently, I don’t know):

# estimate model and create a model matrix for predictions
library(emmeans)
library(lme4)
library(insight)

mod <- lmer(
  data = ChickWeight,
  weight ~ 1 + Time + (1 + Time | Chick))

s <- seq(from = 0, to = 21, length.out = 30)
newdata <- data.frame(Time = s)
newdata$weight <- 0
mm <- get_modelmatrix(mod, data = newdata)

# estimate with `emmeans`
em <- ref_grid(
  object = mod,
  specs = ~ Time,
  at = list(Time = s),
  lmer.df = "satterthwaite") |>
  as.data.frame()

# manual computation
get_dof_satterthwaite <- function(model, modelmatrix) {
    f <- function(i) {
        suppressMessages(
            lmerTest::calcSatterth(model, modelmatrix[i, , drop = FALSE])$denom)
    }
    out <- sapply(seq_len(nrow(modelmatrix)), f)
    return(out)
}
em$df_new <- get_dof_satterthwaite(mod, mm)

# comparison
em$equal <- em$df == em$df_new

em

##          Time prediction        SE       df   df_new equal
## 1   0.0000000   29.17800 1.9572766 49.13050 49.13050  TRUE
## 2   0.7241379   35.29918 1.6274605 49.27943 49.27943  TRUE
## 3   1.4482759   41.42035 1.3315708 49.47338 49.47338  TRUE
## 4   2.1724138   47.54153 1.0974018 49.56893 49.56893  TRUE
## 5   2.8965517   53.66270 0.9706994 49.12761 49.12761  TRUE
## 6   3.6206897   59.78388 0.9934691 48.30421 48.30421  TRUE
## 7   4.3448276   65.90505 1.1569189 47.99833 47.99833  TRUE
## 8   5.0689655   72.02623 1.4130459 48.07510 48.07510  TRUE
## 9   5.7931034   78.14740 1.7209569 48.20401 48.20401  TRUE
## 10  6.5172414   84.26858 2.0575334 48.30298 48.30298  TRUE
## 11  7.2413793   90.38975 2.4107990 48.37173 48.37173  TRUE
## 12  7.9655172   96.51093 2.7743858 48.41988 48.41988  TRUE
## 13  8.6896552  102.63210 3.1447159 48.45474 48.45474  TRUE
## 14  9.4137931  108.75328 3.5196614 48.48088 48.48088  TRUE
## 15 10.1379310  114.87445 3.8978907 48.50112 48.50112  TRUE
## 16 10.8620690  120.99563 4.2785329 48.51722 48.51722  TRUE
## 17 11.5862069  127.11680 4.6609970 48.53033 48.53033  TRUE
## 18 12.3103448  133.23798 5.0448686 48.54121 48.54121  TRUE
## 19 13.0344828  139.35915 5.4298491 48.55039 48.55039  TRUE
## 20 13.7586207  145.48033 5.8157185 48.55823 48.55823  TRUE
## 21 14.4827586  151.60151 6.2023107 48.56503 48.56503  TRUE
## 22 15.2068966  157.72268 6.5894986 48.57096 48.57096  TRUE
## 23 15.9310345  163.84386 6.9771829 48.57620 48.57620  TRUE
## 24 16.6551724  169.96503 7.3652854 48.58085 48.58085  TRUE
## 25 17.3793103  176.08621 7.7537431 48.58501 48.58501  TRUE
## 26 18.1034483  182.20738 8.1425053 48.58876 48.58876  TRUE
## 27 18.8275862  188.32856 8.5315304 48.59215 48.59215  TRUE
## 28 19.5517241  194.44973 8.9207839 48.59524 48.59524  TRUE
## 29 20.2758621  200.57091 9.3102372 48.59806 48.59806  TRUE
## 30 21.0000000  206.69208 9.6998662 48.60065 48.60065  TRUE

Jan 28 '22 21:01 vincentarelbundock

Relatedly, also HC robust SEs?

Jan 28 '22 23:01 bwiernik

Relatedly, also HC robust SEs?

I think this is already possible, but it's a bit hard to find in the docs because you have to click through to get_predicted_ci:

https://easystats.github.io/insight/reference/get_predicted_ci.html

https://easystats.github.io/insight/reference/get_predicted.html

Jan 29 '22 17:01 vincentarelbundock

w.r.t. previous post, there might be an argument to be made for bringing the vcov-related arguments directly into the get_predicted signature, for user-friendly discovery purposes.

Jan 29 '22 17:01 vincentarelbundock

And into the modelbased signatures

Jan 30 '22 11:01 bwiernik

(We should probably merge the get_predicted and get_predicted_ci documentation page)

Jan 31 '22 01:01 DominiqueMakowski

Also interesting in this context: https://github.com/glmmTMB/glmmTMB/blob/satterthwaite_df/glmmTMB/vignettes/satterthwaite_with_glmmTMB.Rmd

Apr 26 '22 10:04 strengejacke

I'm wondering whether we should make this an internal function? I expect get_df() to return df's for the parameters, not the predictions. We could also add Satterthwait/KR df's to get_df(), but then use a more efficient way.

library(lme4)
#> Loading required package: Matrix
library(insight)
library(parameters)

model <- lme4::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)

dof_satterthwaite(model)
#>  (Intercept) Petal.Length 
#>     2.454955   144.560891

get_df(model, type = "satterthwaite", data = get_data(model))
#>   [1] 2.097581 2.097581 2.116998 2.079048 2.097581 2.044592 2.097581 2.079048
#>   [9] 2.097581 2.079048 2.079048 2.061388 2.097581 2.158524 2.137309 2.079048
#>  [17] 2.116998 2.097581 2.044592 2.079048 2.044592 2.079048 2.180654 2.044592
#>  [25] 2.013560 2.061388 2.061388 2.079048 2.097581 2.061388 2.061388 2.079048
#>  [33] 2.079048 2.097581 2.079048 2.137309 2.116998 2.097581 2.116998 2.079048
#>  [41] 2.116998 2.116998 2.116998 2.061388 2.013560 2.097581 2.061388 2.097581
#>  [49] 2.079048 2.097581 1.913469 1.900500 1.929574 1.881629 1.906595 1.900500
#>  [57] 1.913469 1.887420 1.906595 1.880160 1.881936 1.886868 1.881629 1.913469
#>  [65] 1.880344 1.895183 1.900500 1.883865 1.900500 1.880160 1.921127 1.881629
#>  [73] 1.929574 1.913469 1.890640 1.895183 1.921127 1.938814 1.900500 1.881936
#>  [81] 1.879456 1.879518 1.880160 1.948852 1.900500 1.900500 1.913469 1.895183
#>  [89] 1.883865 1.881629 1.895183 1.906595 1.881629 1.887420 1.886868 1.886868
#>  [97] 1.886868 1.890640 1.901423 1.883865 2.076164 1.948852 2.058643 2.011223
#> [105] 2.041985 2.199959 1.900500 2.133999 2.041985 2.094557 1.948852 1.971345
#> [113] 1.997103 1.938814 1.948852 1.971345 1.997103 2.223802 2.274346 1.938814
#> [121] 2.026181 1.929574 2.223802 1.929574 2.026181 2.076164 1.921127 1.929574
#> [129] 2.011223 2.041985 2.094557 2.155069 2.011223 1.948852 2.011223 2.094557
#> [137] 2.011223 1.997103 1.921127 1.983812 2.011223 1.948852 1.948852 2.058643
#> [145] 2.026181 1.959694 1.938814 1.959694 1.983812 1.948852


microbenchmark::microbenchmark(
  dof_satterthwaite(model),
  get_df(model, type = "satterthwaite", data = get_data(model)),
  10
)
#> Warning in microbenchmark::microbenchmark(dof_satterthwaite(model),
#> get_df(model, : Could not measure a positive execution time for 18 evaluations.
#> Unit: nanoseconds
#>                                                           expr        min
#>                                       dof_satterthwaite(model)   24243201
#>  get_df(model, type = "satterthwaite", data = get_data(model)) 2675164601
#>                                                             10          0
#>          lq         mean     median         uq        max neval cld
#>    24753351 2.582621e+07   25208152   26284050   31916500   100  b 
#>  2724342701 2.757082e+09 2738195701 2763005401 3204777800   100   c
#>           0 5.974000e+01          1        101        402   100 a



model2 <- lmerTest::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
summary(model2, ddf = "Satterthwaite")
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: Sepal.Length ~ Petal.Length + (1 | Species)
#>    Data: iris
#> 
#> REML criterion at convergence: 119.8
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -2.23334 -0.67332 -0.01684  0.67680  3.04429 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  Species  (Intercept) 1.1617   1.0778  
#>  Residual             0.1143   0.3381  
#> Number of obs: 150, groups:  Species, 3
#> 
#> Fixed effects:
#>               Estimate Std. Error        df t value Pr(>|t|)    
#> (Intercept)    2.50446    0.66743   2.45496   3.752   0.0463 *  
#> Petal.Length   0.88847    0.06379 144.56089  13.927   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr)
#> Petal.Lngth -0.359

dof_satterthwaite(model)
#>  (Intercept) Petal.Length 
#>     2.454955   144.560891
dof_satterthwaite(model2)
#>  (Intercept) Petal.Length 
#>     2.454955   144.560891

^{Created on 2022-04-29 by the reprex package (v2.0.1)}

Apr 29 '22 11:04 strengejacke

How about an argument to get df for either parameters or predictions, defaulting you parameters?

Apr 29 '22 16:04 bwiernik

The argument sounds find, but I don't have a strong opinion about this. I'd like to know before releasing the next version of my package though ;)

May 02 '22 18:05 vincentarelbundock

I'd say we stick to the implementation as it is, and can add options for sw / kr df's for parameters (not predictions) later, controlled by an argument, that probably defaults to the current behaviour.

We could already decide on the additional argument now, even if it has no function yet (because we have satterthwaite for now only for predictions)? Then you can better "plan" or anticipate forthcoming changes in insight for your packages?

May 02 '22 19:05 strengejacke

Aren't we moving degrees_of_freedom to insight?

May 02 '22 19:05 vincentarelbundock

Not moving, rather "integrating", I'd say. Much of the code from degrees_of_freedom() was just re-used when implementing get_df(). I would thus suggest to have the functionality in insight, while parameters just re-exports get_df() and probably adds an degrees_of_freedom() alias.

May 02 '22 20:05 strengejacke

Let's not decide on anything now and release as-is. My guess is we'll develop stronger preferences about the specific interface once we do the "integration" work you suggest.

May 03 '22 15:05 vincentarelbundock

@strengejacke I believe that your commit https://github.com/easystats/insight/commit/87c092758ebaba97b8fbd029afe54b4418a8a8d8 broke the Satterthwaite DF estimation for get_predicted(). When we call get_df(), we now get one DF per coefficient. This is not what the old code was doing, and I don't think this is what we want to build CIs around predictions.

Edit: Noticed because the change broke marginaleffects tests downstream.

Sep 29 '22 00:09 vincentarelbundock

Hm, this should still work, see https://github.com/easystats/insight/blob/e08e9f7fa4f7a2c680cdb1bb74ea0e8e062f47ca/R/get_predicted_ci.R#L256

Sep 29 '22 05:09 strengejacke

OK, but how do I get the per observation degrees of freedom for use in my own functions in marginaleffects? Those used to be the default. Are they still exposed or only available to insight functions?

Sep 29 '22 10:09 vincentarelbundock

Ah, got it. I thought you were using get_predictions() directly. I can add an argument that will return the per observation df's. But I think the default should be per parameter.

Sep 29 '22 15:09 strengejacke

Ah, got it. I thought you were using get_predictions() directly. I can add an argument that will return the per observation df's. But I think the default should be per parameter.

Thanks, that would be great. And yes, I agree that default makes sense.

Sep 29 '22 15:09 vincentarelbundock

See last commit - should we make this change the next release (will break marginaleffects) or keep the old default, switching to per-parameter-df a release later?

Sep 29 '22 19:09 strengejacke

I'm on the move, so I can't test it now, but at firdt glance this looks great. Thanks!

The last marginaleffects was released just a few days after the last insight. I think we can just keep things in sync like that and it should be fine. This is kind of a niche feature anyway, and people can install the dev if they want.

Sep 29 '22 21:09 vincentarelbundock

Looks like it's working now. Thanks again, I really appreciate the help on this!

Oct 06 '22 03:10 vincentarelbundock

You should be safe if you update your package before insight, right?

Oct 06 '22 05:10 strengejacke

Yes, everything should be fine. I'm no longer running tests on CRAN...

Oct 06 '22 10:10 vincentarelbundock

insight insight copied to clipboard

`get_predicted`: Satterthwaite and Kenward df

insight
insight copied to clipboard