insight
insight copied to clipboard
`get_predicted`: Satterthwaite and Kenward df
Is there an easy way to repurpose some of the code in parameters
to allow these in insight::get_predicted
or should I start from scratch?
Git Blame says that @strengejacke wrote a lot of the code there, so he may have ideas for and preferences over implementation
For reference, I believe that this is basically how emmeans
gets them (perhaps more efficiently, I don’t know):
# estimate model and create a model matrix for predictions
library(emmeans)
library(lme4)
library(insight)
mod <- lmer(
data = ChickWeight,
weight ~ 1 + Time + (1 + Time | Chick))
s <- seq(from = 0, to = 21, length.out = 30)
newdata <- data.frame(Time = s)
newdata$weight <- 0
mm <- get_modelmatrix(mod, data = newdata)
# estimate with `emmeans`
em <- ref_grid(
object = mod,
specs = ~ Time,
at = list(Time = s),
lmer.df = "satterthwaite") |>
as.data.frame()
# manual computation
get_dof_satterthwaite <- function(model, modelmatrix) {
f <- function(i) {
suppressMessages(
lmerTest::calcSatterth(model, modelmatrix[i, , drop = FALSE])$denom)
}
out <- sapply(seq_len(nrow(modelmatrix)), f)
return(out)
}
em$df_new <- get_dof_satterthwaite(mod, mm)
# comparison
em$equal <- em$df == em$df_new
em
## Time prediction SE df df_new equal
## 1 0.0000000 29.17800 1.9572766 49.13050 49.13050 TRUE
## 2 0.7241379 35.29918 1.6274605 49.27943 49.27943 TRUE
## 3 1.4482759 41.42035 1.3315708 49.47338 49.47338 TRUE
## 4 2.1724138 47.54153 1.0974018 49.56893 49.56893 TRUE
## 5 2.8965517 53.66270 0.9706994 49.12761 49.12761 TRUE
## 6 3.6206897 59.78388 0.9934691 48.30421 48.30421 TRUE
## 7 4.3448276 65.90505 1.1569189 47.99833 47.99833 TRUE
## 8 5.0689655 72.02623 1.4130459 48.07510 48.07510 TRUE
## 9 5.7931034 78.14740 1.7209569 48.20401 48.20401 TRUE
## 10 6.5172414 84.26858 2.0575334 48.30298 48.30298 TRUE
## 11 7.2413793 90.38975 2.4107990 48.37173 48.37173 TRUE
## 12 7.9655172 96.51093 2.7743858 48.41988 48.41988 TRUE
## 13 8.6896552 102.63210 3.1447159 48.45474 48.45474 TRUE
## 14 9.4137931 108.75328 3.5196614 48.48088 48.48088 TRUE
## 15 10.1379310 114.87445 3.8978907 48.50112 48.50112 TRUE
## 16 10.8620690 120.99563 4.2785329 48.51722 48.51722 TRUE
## 17 11.5862069 127.11680 4.6609970 48.53033 48.53033 TRUE
## 18 12.3103448 133.23798 5.0448686 48.54121 48.54121 TRUE
## 19 13.0344828 139.35915 5.4298491 48.55039 48.55039 TRUE
## 20 13.7586207 145.48033 5.8157185 48.55823 48.55823 TRUE
## 21 14.4827586 151.60151 6.2023107 48.56503 48.56503 TRUE
## 22 15.2068966 157.72268 6.5894986 48.57096 48.57096 TRUE
## 23 15.9310345 163.84386 6.9771829 48.57620 48.57620 TRUE
## 24 16.6551724 169.96503 7.3652854 48.58085 48.58085 TRUE
## 25 17.3793103 176.08621 7.7537431 48.58501 48.58501 TRUE
## 26 18.1034483 182.20738 8.1425053 48.58876 48.58876 TRUE
## 27 18.8275862 188.32856 8.5315304 48.59215 48.59215 TRUE
## 28 19.5517241 194.44973 8.9207839 48.59524 48.59524 TRUE
## 29 20.2758621 200.57091 9.3102372 48.59806 48.59806 TRUE
## 30 21.0000000 206.69208 9.6998662 48.60065 48.60065 TRUE
Relatedly, also HC robust SEs?
Relatedly, also HC robust SEs?
I think this is already possible, but it's a bit hard to find in the docs because you have to click through to get_predicted_ci
:
https://easystats.github.io/insight/reference/get_predicted_ci.html
https://easystats.github.io/insight/reference/get_predicted.html
w.r.t. previous post, there might be an argument to be made for bringing the vcov
-related arguments directly into the get_predicted
signature, for user-friendly discovery purposes.
And into the modelbased signatures
(We should probably merge the get_predicted and get_predicted_ci documentation page)
Also interesting in this context: https://github.com/glmmTMB/glmmTMB/blob/satterthwaite_df/glmmTMB/vignettes/satterthwaite_with_glmmTMB.Rmd
I'm wondering whether we should make this an internal function? I expect get_df()
to return df's for the parameters, not the predictions. We could also add Satterthwait/KR df's to get_df()
, but then use a more efficient way.
library(lme4)
#> Loading required package: Matrix
library(insight)
library(parameters)
model <- lme4::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
dof_satterthwaite(model)
#> (Intercept) Petal.Length
#> 2.454955 144.560891
get_df(model, type = "satterthwaite", data = get_data(model))
#> [1] 2.097581 2.097581 2.116998 2.079048 2.097581 2.044592 2.097581 2.079048
#> [9] 2.097581 2.079048 2.079048 2.061388 2.097581 2.158524 2.137309 2.079048
#> [17] 2.116998 2.097581 2.044592 2.079048 2.044592 2.079048 2.180654 2.044592
#> [25] 2.013560 2.061388 2.061388 2.079048 2.097581 2.061388 2.061388 2.079048
#> [33] 2.079048 2.097581 2.079048 2.137309 2.116998 2.097581 2.116998 2.079048
#> [41] 2.116998 2.116998 2.116998 2.061388 2.013560 2.097581 2.061388 2.097581
#> [49] 2.079048 2.097581 1.913469 1.900500 1.929574 1.881629 1.906595 1.900500
#> [57] 1.913469 1.887420 1.906595 1.880160 1.881936 1.886868 1.881629 1.913469
#> [65] 1.880344 1.895183 1.900500 1.883865 1.900500 1.880160 1.921127 1.881629
#> [73] 1.929574 1.913469 1.890640 1.895183 1.921127 1.938814 1.900500 1.881936
#> [81] 1.879456 1.879518 1.880160 1.948852 1.900500 1.900500 1.913469 1.895183
#> [89] 1.883865 1.881629 1.895183 1.906595 1.881629 1.887420 1.886868 1.886868
#> [97] 1.886868 1.890640 1.901423 1.883865 2.076164 1.948852 2.058643 2.011223
#> [105] 2.041985 2.199959 1.900500 2.133999 2.041985 2.094557 1.948852 1.971345
#> [113] 1.997103 1.938814 1.948852 1.971345 1.997103 2.223802 2.274346 1.938814
#> [121] 2.026181 1.929574 2.223802 1.929574 2.026181 2.076164 1.921127 1.929574
#> [129] 2.011223 2.041985 2.094557 2.155069 2.011223 1.948852 2.011223 2.094557
#> [137] 2.011223 1.997103 1.921127 1.983812 2.011223 1.948852 1.948852 2.058643
#> [145] 2.026181 1.959694 1.938814 1.959694 1.983812 1.948852
microbenchmark::microbenchmark(
dof_satterthwaite(model),
get_df(model, type = "satterthwaite", data = get_data(model)),
10
)
#> Warning in microbenchmark::microbenchmark(dof_satterthwaite(model),
#> get_df(model, : Could not measure a positive execution time for 18 evaluations.
#> Unit: nanoseconds
#> expr min
#> dof_satterthwaite(model) 24243201
#> get_df(model, type = "satterthwaite", data = get_data(model)) 2675164601
#> 10 0
#> lq mean median uq max neval cld
#> 24753351 2.582621e+07 25208152 26284050 31916500 100 b
#> 2724342701 2.757082e+09 2738195701 2763005401 3204777800 100 c
#> 0 5.974000e+01 1 101 402 100 a
model2 <- lmerTest::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)
summary(model2, ddf = "Satterthwaite")
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: Sepal.Length ~ Petal.Length + (1 | Species)
#> Data: iris
#>
#> REML criterion at convergence: 119.8
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -2.23334 -0.67332 -0.01684 0.67680 3.04429
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> Species (Intercept) 1.1617 1.0778
#> Residual 0.1143 0.3381
#> Number of obs: 150, groups: Species, 3
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 2.50446 0.66743 2.45496 3.752 0.0463 *
#> Petal.Length 0.88847 0.06379 144.56089 13.927 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> Petal.Lngth -0.359
dof_satterthwaite(model)
#> (Intercept) Petal.Length
#> 2.454955 144.560891
dof_satterthwaite(model2)
#> (Intercept) Petal.Length
#> 2.454955 144.560891
Created on 2022-04-29 by the reprex package (v2.0.1)
How about an argument to get df for either parameters or predictions, defaulting you parameters?
The argument sounds find, but I don't have a strong opinion about this. I'd like to know before releasing the next version of my package though ;)
I'd say we stick to the implementation as it is, and can add options for sw / kr df's for parameters (not predictions) later, controlled by an argument, that probably defaults to the current behaviour.
We could already decide on the additional argument now, even if it has no function yet (because we have satterthwaite for now only for predictions)? Then you can better "plan" or anticipate forthcoming changes in insight for your packages?
Aren't we moving degrees_of_freedom
to insight
?
Not moving, rather "integrating", I'd say. Much of the code from degrees_of_freedom()
was just re-used when implementing get_df()
. I would thus suggest to have the functionality in insight, while parameters just re-exports get_df()
and probably adds an degrees_of_freedom()
alias.
Let's not decide on anything now and release as-is. My guess is we'll develop stronger preferences about the specific interface once we do the "integration" work you suggest.
@strengejacke I believe that your commit https://github.com/easystats/insight/commit/87c092758ebaba97b8fbd029afe54b4418a8a8d8 broke the Satterthwaite DF estimation for get_predicted()
. When we call get_df()
, we now get one DF per coefficient. This is not what the old code was doing, and I don't think this is what we want to build CIs around predictions.
Edit: Noticed because the change broke marginaleffects
tests downstream.
Hm, this should still work, see https://github.com/easystats/insight/blob/e08e9f7fa4f7a2c680cdb1bb74ea0e8e062f47ca/R/get_predicted_ci.R#L256
OK, but how do I get the per observation degrees of freedom for use in my own functions in marginaleffects? Those used to be the default. Are they still exposed or only available to insight functions?
Ah, got it. I thought you were using get_predictions()
directly. I can add an argument that will return the per observation df's. But I think the default should be per parameter.
Ah, got it. I thought you were using
get_predictions()
directly. I can add an argument that will return the per observation df's. But I think the default should be per parameter.
Thanks, that would be great. And yes, I agree that default makes sense.
See last commit - should we make this change the next release (will break marginaleffects) or keep the old default, switching to per-parameter-df a release later?
I'm on the move, so I can't test it now, but at firdt glance this looks great. Thanks!
The last marginaleffects was released just a few days after the last insight. I think we can just keep things in sync like that and it should be fine. This is kind of a niche feature anyway, and people can install the dev if they want.
Looks like it's working now. Thanks again, I really appreciate the help on this!
You should be safe if you update your package before insight, right?
Yes, everything should be fine. I'm no longer running tests on CRAN...