gratia
gratia copied to clipboard
Pointwise intervals using gratia vs predict.gam discrepancy
It seems the confint() method in {gratia} isn't producing the correct predictions in a multiple time series GAM with group level smoothers for each factor and a random intercept for each group.
When simulating the data, I've simulated one series to have the same relationship with the response, but a higher average response as compared to the other one.
The predictions from confint() seem not to be taking in the random intercepts into account.
library(tidyverse)
library(mgcv)
#> Loading required package: nlme
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#>
#> collapse
#> This is mgcv 1.8-27. For overview type 'help("mgcv-package")'.
library(gratia)
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
### Create two time series with the same data generating process, but spiked with noise to create artificial differences
x1 <- arima.sim(n = 1e3, list(ar = 0.4), rand.gen = function(n, ...) rnorm(n, 0,1))
x2 <- arima.sim(n = 1e3, list(ar = 0.4), rand.gen = function(n, ...) rnorm(n, 0,1))
## spike in noise to create a differnece in the two time series
f1 <- x1 + 0.5*x1^2 + 0.8*x1^3 + rnorm(1e3,12,2)
f1 <- cbind(f1, x1) %>%
data.frame() %>%
mutate(sample = rep("x1",nrow(.))) %>%
rename(y = f1, time = x1)
f2 <- x2 + 0.5*x2^2 + 0.8*x2^3 + rnorm(1e3,0,2)
f2 <- cbind(f2, x2) %>%
data.frame() %>%
mutate(sample = rep("x2",nrow(.))) %>%
rename(y = f2, time = x2)
## Create the dataframe
df <- bind_rows(f1, f2) %>%
mutate(sample = as_factor(sample))
### Plot the raw data
df %>%
ggplot(., aes(time, y, group = sample)) + geom_line(aes(color = sample))
### Fit the model
sim_gam <- mgcv::gam(y ~ s(time, by = sample) + s(sample, bs = "re", k = 2), data = df,
method = "REML")
### Enter gratia - calculate pointwise confidence intervals
grid <- expand.grid(time = seq(-3,3, by = 0.01), sample = levels(df$sample))
confint(sim_gam,parm = "time", newdata = grid, unconditional = T, type = "confidence") %>%
ggplot(.,aes(time, est, group = sample)) + geom_line(aes(color = sample)) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.5) + labs(title = "Pointwise from {Gratia}")
### Below is the code to calculate the same intervals manually
pred_list <- predict.gam(sim_gam, newdata = grid, unconditional = T, se.fit = T)
preds <- pred_list$fit %>%
tibble() %>%
janitor::clean_names() %>%
rename(fit = x)
cis <- pred_list$se.fit %>%
tibble() %>%
janitor::clean_names() %>%
rename(se = x)
bind_cols(preds, cis, grid) %>%
mutate(lower = fit - 2*se, upper = fit + 2*se) %>%
ggplot(.,aes(time, fit, group = sample)) + geom_line(aes(color = sample)) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.5) + labs(title = "Pointwise by predict.gam")
sessionInfo()
#> R version 3.5.3 (2019-03-11)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 16299)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] janitor_1.2.0 gratia_0.2-11 mgcv_1.8-27 nlme_3.1-137
#> [5] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2
#> [9] readr_1.3.1 tidyr_1.0.0 tibble_2.1.3 ggplot2_3.2.1
#> [13] tidyverse_1.2.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_0.2.5 xfun_0.7 splines_3.5.3 haven_2.1.0
#> [5] lattice_0.20-38 snakecase_0.11.0 colorspace_1.4-1 vctrs_0.2.0
#> [9] generics_0.0.2 htmltools_0.3.6 yaml_2.2.0 rlang_0.4.0
#> [13] pillar_1.4.2 glue_1.3.1 withr_2.1.2 modelr_0.1.4
#> [17] readxl_1.3.1 lifecycle_0.1.0 munsell_0.5.0 gtable_0.3.0
#> [21] cellranger_1.1.0 rvest_0.3.4 mvnfast_0.2.5 evaluate_0.14
#> [25] labeling_0.3 knitr_1.23 highr_0.8 broom_0.5.2
#> [29] Rcpp_1.0.2 scales_1.0.0 backports_1.1.4 jsonlite_1.6
#> [33] hms_0.4.2 digest_0.6.21 stringi_1.4.3 cowplot_1.0.0
#> [37] grid_3.5.3 cli_1.1.0 tools_3.5.3 magrittr_1.5
#> [41] lazyeval_0.2.2 crayon_1.3.4 pkgconfig_2.0.3 zeallot_0.1.0
#> [45] ellipsis_0.3.0 Matrix_1.2-15 xml2_1.2.0 lubridate_1.7.4
#> [49] assertthat_0.2.1 rmarkdown_1.13 httr_1.4.0 R6_2.4.0
#> [53] compiler_3.5.3
Created on 2019-10-04 by the reprex package (v0.3.0)
This is probably some conflict with selecting terms (should be smooths); does it work if you don't specify parm
?
I'll try to take a look later.
Thank you! No - I tried leaving out parm but it throws an error.
Error in add_s(parm) : argument "parm" is missing, with no default
I tested it using the 0.2 -8 version as well, but the same thing.
Appreciate all your effort on this wonderful library!
Having had another think about this, gratia:::confint.gam
is doing the right thing. confint
methods are traditionally for parameters of a model, not predictions from the model. As factor by
smooths are centred, the confidence interval on the smooths (in your example, if I follow how you created it) should be essentially the same; they'd be different if the underlying polynomials were different.
What you want is a confidence interval on the combinations of the smooth plus the parametric terms involved in the by
factor smooth. That's a reasonable request but not something for confint()
given the precedence of what this function has traditionally been used for. I'll have a think about how to best provide what you want; from v 0.3.0 onward gratia hasfitted_samples()
and predicted_samples()
which could be used to do (almost) what you want, but they only know how to work with the full model specification not selected parts of it. More generally what you want is something similar to fitted_samples()
but with an exclude
option sensu predict.gam()
's exclude
argument which can set some terms to zero (or perhaps terms
to select only the ones you want...)
I think this is now quite doable using fitted_values()
:
ds <- data_slice(sim_gam, time = seq(-3,3, by = 0.01), sample = evenly(sample))
fv <- fitted_values(sim_gam, data = ds)
fv |>
ggplot(aes(x = time, y = .fitted, group = sample)) +
geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci, fill = sample), alpha = 0.2) +
geom_line(aes(colour = sample))
As this isn't something I would ever expect confint()
to do as that's not it's intended use, I'm going to close this.