projpred
projpred copied to clipboard
`poly()` terms
It seems like varsel()
and cv_varsel()
don't handle poly()
predictor terms properly:
options(mc.cores = parallel::detectCores(logical = FALSE))
data("df_gaussian", package = "projpred")
mydat <- cbind("y" = df_gaussian$y, as.data.frame(df_gaussian$x))
library(rstanarm)
myfit <- stan_glm(y ~ V1 + poly(V2, 3) + V3 + V4 + V5,
data = mydat,
seed = 1140350788)
library(projpred)
my_vs <- varsel(myfit, nclusters = 3, nclusters_pred = 5, seed = 34632)
my_cvvs <- cv_varsel(myfit, nclusters = 3, nclusters_pred = 5, seed = 34632)
where the last two lines both throw the error
Error in str2lang(x) : <text>:1:31: unexpected numeric constant
1: . ~ V1 + V5 + V3 + poly(V2, 3)1
I guess poly()
terms are rather rarely used, so this should not be too urgent.
The problem seems to be search_L1()
here.
I guess the problem is rather in split formula as we don’t recognize these terms currently.
Best, Alex
From: Frank Weber @.>
Sent: Monday, August 2, 2021 11:05:10 AM
To: stan-dev/projpred @.>
Cc: Subscribed @.***>
Subject: Re: [stan-dev/projpred] poly()
terms (#183)
The problem seems to be search_L1() here.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/183#issuecomment-890813402, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH4FIY4KJZ7CPH2JKBLT2ZGTNANCNFSM5BMGIJ4Q.
Yes, perhaps
The problem indeed turned out to be L1 search (more precisely: collapse_contrasts_solution_path()
). I'll push a fix asap. What I wanted to add here is that in my reprex above, the cv_varsel()
call cannot work anyway (even with the poly()
bug fixed) because it would throw
Error in poly(V2, 3) : 'degree' must be less than number of unique points
To make it work, either
my_cvvs_K <- cv_varsel(myfit, cv_method = "kfold", nclusters = 3, nclusters_pred = 5, seed = 34632)
or
myfit_raw <- stan_glm(y ~ V1 + poly(V2, 3, raw = TRUE) + V3 + V4 + V5,
data = mydat,
seed = 1140350788)
my_cvvs_raw <- cv_varsel(myfit_raw, nclusters = 3, nclusters_pred = 5, seed = 34632)
needs to be used.
Also note that apart from poly()
, there is also polym()
.