sdmTMB icon indicating copy to clipboard operation
sdmTMB copied to clipboard

What should predict.sdmTMB return when offset is left NULL?

Open seananderson opened this issue 8 months ago • 1 comments

I just realized that glm() and glmmTMB both always include the offset in the prediction and always return the prediction on the original data no matter what newdata is if the offset argument is used instead of the offset() formula. This seems kind of crazy to me, but I guess that's the standard.

library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)

m <- glm(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept))
p1 <- predict(m)
p2 <- predict(m, newdata = dat[1:2,])
#> Warning in predictor + offset: longer object length is not a multiple of
#> shorter object length

p1[1:10]
#>        1        4        5        6        7        9       10       11 
#> 4.863265 5.035115 4.798726 5.035115 4.923889 4.798726 5.035115 4.863265 
#>       12       13 
#> 4.923889 4.798726
p2[1:10]
#>  [1] 4.863265 5.035115 4.798726 5.035115 4.923889 4.798726 5.035115 4.863265
#>  [9] 4.923889 4.798726


m <- glmmTMB::glmmTMB(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept))
p1 <- predict(m)
p2 <- predict(m, newdata = dat[1:2,])

p1[1:10]
#>  [1] 4.863263 5.035113 4.798724 5.035113 4.923887 4.798724 5.035113 4.863263
#>  [9] 4.923887 4.798724
p2[1:10]
#>  [1] 4.863263 5.035113 4.798724 5.035113 4.923887 4.798724 5.035113 4.863263
#>  [9] 4.923887 4.798724

Created on 2023-11-03 with reprex v2.0.2

That changes if you put the offset in the formula, which sdmTMB does not currently support:

library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)

m <- glm(catch_weight ~ 1 + offset(area_swept), data = dat, family = Gamma("log"))
p1 <- predict(m)
nd <- dat[1:2,]
nd$area_swept <- 0
p2 <- predict(m, newdata = nd)

p1[1:10]
#>        1        4        5        6        7        9       10       11 
#> 4.970636 4.989926 4.964206 4.989926 4.977066 4.964206 4.989926 4.970636 
#>       12       13 
#> 4.977066 4.964206
p2
#>        1        4 
#> 4.867756 4.867756

m <- glmmTMB::glmmTMB(catch_weight ~ 1 + offset(area_swept), data = dat, family = Gamma("log"))
p1 <- predict(m)
p2 <- predict(m, newdata = nd)

p1[1:10]
#>  [1] 4.970636 4.989926 4.964206 4.989926 4.977066 4.964206 4.989926 4.970636
#>  [9] 4.977066 4.964206
p2
#> [1] 4.867756 4.867756

Created on 2023-11-03 with reprex v2.0.2

We need the ability to set the offset to 0 for index standardization, among other uses. It's obvious what to do when newdata and offset are specified in predict.sdmTMB(). But what about:

newdata = NULL, offset = NULL: predict on original data with offset = 0 newdata = df, offset = NULL: predict on df with offset = 0 newdata = NULL, offset = some_offset: predict on original data with offset = some_offset; this was wrong and was being returned as original data and original offset

I assume I should do the above. For one, people have code that depends on it. I never realized the difference in the formula vs. argument offset specification.

Just writing this out to document the thought process.

seananderson avatar Nov 03 '23 23:11 seananderson