brms icon indicating copy to clipboard operation
brms copied to clipboard

feature request: Handling Inf in data

Open wpetry opened this issue 4 months ago • 2 comments

Description of current behavior

When Inf or -Inf are encountered in data, brm passes these rows to Stan, which fails because it is not able to evaluate the lp at the initial values. I think the standard troubleshooting for this error is to specify init = ... and/or to use more informative priors. But Stan will fail with this same error regardless of the initial values or priors specified. This appears to be a fitting issue, when in reality the source of the problem is in the data.

reprex:

library(brms)

x <- 0:100
mu <- 10 + 0.3 * x
y <- rnorm(mu, sd = 2)
dat <- data.frame(x, y)
dat$y[1] <- Inf

mod <- brm(y ~ 1 + x, data = dat)  # fails with Stan initialization error
mod2 <- lm(y ~ 1 + x, data = dat)  # base R regression gives a (somewhat) informative error in the same circumstance

Desired feature behavior

I think the best approach would be to stop the model fitting with an informative error instead of a warning. Infinite values are likely artifacts of errors during the calculation of variables and warrant re-examination before fitting any model (e.g., dividing by 0, log-transforming 0, etc.).

A softer approach would be to drop rows containing infinite values with a warning on the R side, then pass the cleaned data to Stan for fitting. This mirrors the handling of rows containing NA (absent user-specified imputation with mi()). I don't favor this approach because I'm not able to think of cases when it's still reasonable to fit a model after learning that some of the variable values are infinite.

wpetry avatar Oct 16 '24 16:10 wpetry