gratia
gratia copied to clipboard
Parametric term evaluation for ziplss models
I'm working with models from the ziplss family that have parametric (i.e. non-smooth) terms, and these cause the evaluate_parametric_term
method (and hence draw
) to fail with an error message like
Error in evaluate_parametric_term.gam(object, term = terms[i]) : Term is not in the parametric part of model: <x3>
,
because it does not handle the list structure of the two predictor formulae gracefully.
Reprex collapsed in here
#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 *
(10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)
## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1)
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence
## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))
## Fit ZIP model...
b <- gam(list(y~s(x2)+s(x3),~s(x0)+s(x1)),family=ziplss())
draw(b)
b1 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b1) # fails
plot(b1,pages=1, all.terms = TRUE) #works
I've traced the issue to
https://github.com/gavinsimpson/gratia/blob/245ac45e87d64ef347ff8bf90af372cbf0273590/R/evaluate_smooth.R#L546
which doesn't behave as expected as tt
is now a two-element list with labels "1" "2"
.
My hacky attempt to fix this by changing the above line to
vars <- unlist(lapply(tt, labels))
produces reasonable results if there is only a single parameteric predictor in the model, but falls over with a cryptic plot_grid
error that I can trace to align_margin
, but which I don't understand:
#after hack-fixing evaluate_parametric_term
b2 <- gam(list(y~s(x2)+x3,~s(x0)+s(x1),family=ziplss())
draw(b2) #this works
b3 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b3) # this fails
#Error in 1:(grep("null", x)[1] - 1) : NA/NaN argument
#In addition: Warning message:
#In predict.gam(object, newdata = mf, type = "terms", terms = term, :
# non-existent terms requested - ignoring
#Called from: FUN(X[[i]], ...)
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gratia_0.2-8 mgcv_1.8-28 nlme_3.1-140
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 splines_3.6.1 tidyselect_0.2.5 munsell_0.5.0
[7] cowplot_1.0.0 colorspace_1.4-1 lattice_0.20-38 R6_2.4.0 rlang_0.4.0 dplyr_0.8.3
[13] tools_3.6.1 packrat_0.5.0 grid_3.6.1 gtable_0.3.0 lazyeval_0.2.2 assertthat_0.2.1
[19] tibble_2.1.3 crayon_1.3.4 Matrix_1.2-17 tidyr_0.8.3 purrr_0.3.2 ggplot2_3.2.0
[25] glue_1.3.1 labeling_0.3 compiler_3.6.1 pillar_1.4.2 scales_1.0.0 mvtnorm_1.0-11
[31] pkgconfig_2.0.2
Hi Philipp, yeah, handling these extended formula models (distributional regression models) is not something gratia does at the moment; Things work for the smooths as Simon just stores them as if they were just a single model with many smooths. The parametric terms code was written without considering those more general models.
This is on the ToDo list that I hope to get to once I get a new course written/start of term, whichever happened first.
I worked out what was going on; a combination of parametric_terms()
returning a list for models with more than one linear predictor, and then needing to rename terms according to mgcvs weird convention of add .1, .2 to all smooths and terms in the higher linear predictors.
This should now work for all LSS models, but I have thus far only check the ziplss examples you provided.
Thanks for reporting this.
Hi Gavin, thanks so much for looking into this and expanding the functionality of this great package!
Unfortunately I discovered that there's a further complication in my use case that wasn't covered by my original reprex. It appears that categorical predictors cause additional problems.
Amended reprex:
#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 *
(10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)
x4 <- sample(factor(c('a','b','c')), size = n, replace = TRUE)
## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1)
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence
## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))
## Fit ZIP model...
b <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b)
#ZIP model with a categorical predictor
b0 <- gam(list(y~s(x2)+x4,~s(x0)+x1),family=ziplss())
draw(b0)# works but warning issued
plot(b0,pages=1, all.terms = TRUE) #works
#ZIP model with linear and categorical predictor
b1 <- gam(list(y~s(x2)+x3+x4,~s(x0)+x1),family=ziplss())
draw(b1) # fails
draw(b1, parametric = FALSE) # works
plot(b1,pages=1, all.terms = TRUE) #works
Having a single categorical predictor yields a warning, but the partial effect plot appears.
Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
variable 'x4' is absent, its contrast will be ignored
Having a categorical predictor and another term leads to a failure
Error: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
In addition: Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
variable 'x4' is absent, its contrast will be ignored
2: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
variable 'x4' is absent, its contrast will be ignored
3: In predict.gam(object, newdata = mf, type = "terms", terms = mgcv_names[ind], :
non-existent terms requested - ignoring
with the following traceback
<error>
message: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
class: `rlang_error`
backtrace:
1. gratia::draw(b1)
2. gratia:::draw.gam(b1)
4. gratia:::evaluate_parametric_term.gam(object, term = terms[i])
6. tibble:::as_tibble.data.frame(evaluated)
7. tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
8. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
9. tibble:::set_repaired_names(x, .name_repair)
14. tibble:::repaired_names(names(x), .name_repair = .name_repair)
15. tibble:::check_unique(new_name)
Call `rlang::last_trace()` to see the full backtrace
which to me appears to indicate that names are being lost as they are being passed from the model frame to the prediction frames, but my knowledge of programming with tibbles is very limited. Any clues?
Thanks for the revised example; I was likely missing some subtleties when implementing this. mgcv can fit such a range of models and this was worked on for simpler models; guess I need to expand the test suites to include some more complex examples.
I’ll take a look at why this is failing.
Finally found some time to fix this after I'd completely changed the way I handle parametric effects (evaluate_parametric_term()
has bee soft deprecated for a little while now). Sorry it took so long.