variable selection for uncorrelated random variables
Hi
I want to fit a Poisson model with uncorrelated random variables and perform projection prediction. Is it true that this is not yet possible with projpred or can I specify my model in a different way such that I can use uncorrelated random variables? I also wonder if it matters for projection prediction whether they are specified as correlated or not.
Unexpected number of | characters in group terms. Please contact the package maintainer.
library(brms)
#> Loading required package: Rcpp
#> Warning: package 'Rcpp' was built under R version 4.3.1
#> Loading 'brms' package (version 2.19.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#>
#> ar
library(projpred)
#> Warning: package 'projpred' was built under R version 4.3.1
#> This is projpred version 2.6.0.
#>
#> Attaching package: 'projpred'
#> The following object is masked from 'package:brms':
#>
#> do_call
# simulate data
species <- paste("species", 1:5, sep = "_")
periods <- paste("period", 1:2, sep = "_")
df <- expand.grid(species, periods)
set.seed(123)
lambdas <- c(rgamma(length(species), 0.7, 0.2),
rgamma(length(species), 0.5, 0.25))
data_list <- lapply(seq_along(df[, 1]), function(i) {
out <- tidyr::expand_grid(df[i, ], rpois(30, lambdas[i]))
names(out) <- c("species", "period", "count")
return(out)
})
dat_df <- do.call(rbind.data.frame, data_list)
# MCMC parameters
nchains <- 3 # number of chains
niter <- 4000 # number of iterations (incl. burn-in)
burnin <- niter / 4 # number of initial samples to discard (burn-in)
nparallel <- nchains # number of cores used for parallel computing
## Uncorrelated random random variables
# Fit model
fit1 <- brm(bf(count ~ period + (1 + period || species)),
data = dat_df,
family = poisson(),
chains = nchains, warmup = burnin, iter = niter, cores = nparallel,
control = list(adapt_delta = 0.99,
max_treedepth = 12))
#> Compiling Stan program...
#> Start sampling
# Projection prediction
cvvs_fast1 <- cv_varsel(fit1,
method = "forward",
nclusters_pred = 20, # speed
cv_method = "LOO",
# only for the sake of speed
validate_search = FALSE)
#> Error in FUN(X[[i]], ...): Unexpected number of `|` characters in group terms. Please contact the package maintainer.
Created on 2023-07-24 with reprex v2.0.2
Kind regards, Ward
Hi Ward,
Thank you for this issue.
According to your brms code, you have uncorrelated group-level intercepts and period slopes for species (because of the || syntax; see the brms documentation and Table 2 in the lme4 vignette Fitting Linear Mixed-Effects Models Using lme4; beware that "uncorrelated" refers to zero correlation of the group-level intercept (regarded as a random variable) and the group-level period slope (regarded as a random variable), so to zero correlation in the corresponding bivariate normal distribution).
You are right that this is not supported by projpred yet. Sorry :/
I'll leave this issue open because (i) we should throw a more informative error message in that case and (ii) as a reminder that this could be implemented in the future.
I forgot to answer two more specific questions:
can I specify my model in a different way such that I can use uncorrelated random variables?
Currently, I don't think this is possible.
I also wonder if it matters for projection prediction whether they are specified as correlated or not.
Yes, it does. But the reason why I added some more explanations concerning the || syntax above is that given these explanations, you might perhaps decide that you don't need the || syntax and instead rely on the ordinary | syntax.
Thank you for your reply and extra information!