projpred icon indicating copy to clipboard operation
projpred copied to clipboard

`poly()` terms

Open fweber144 opened this issue 3 years ago • 3 comments

It seems like varsel() and cv_varsel() don't handle poly() predictor terms properly:

options(mc.cores = parallel::detectCores(logical = FALSE))
data("df_gaussian", package = "projpred")
mydat <- cbind("y" = df_gaussian$y, as.data.frame(df_gaussian$x))
library(rstanarm)
myfit <- stan_glm(y ~ V1 + poly(V2, 3) + V3 + V4 + V5,
                  data = mydat,
                  seed = 1140350788)

library(projpred)
my_vs <- varsel(myfit, nclusters = 3, nclusters_pred = 5, seed = 34632)
my_cvvs <- cv_varsel(myfit, nclusters = 3, nclusters_pred = 5, seed = 34632)

where the last two lines both throw the error

Error in str2lang(x) : <text>:1:31: unexpected numeric constant
1: . ~ V1 + V5 + V3 + poly(V2, 3)1

I guess poly() terms are rather rarely used, so this should not be too urgent.

fweber144 avatar Aug 02 '21 08:08 fweber144

The problem seems to be search_L1() here.

fweber144 avatar Aug 02 '21 08:08 fweber144

I guess the problem is rather in split formula as we don’t recognize these terms currently.

Best, Alex


From: Frank Weber @.> Sent: Monday, August 2, 2021 11:05:10 AM To: stan-dev/projpred @.> Cc: Subscribed @.***> Subject: Re: [stan-dev/projpred] poly() terms (#183)

The problem seems to be search_L1() here.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/183#issuecomment-890813402, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH4FIY4KJZ7CPH2JKBLT2ZGTNANCNFSM5BMGIJ4Q.

AlejandroCatalina avatar Aug 02 '21 08:08 AlejandroCatalina

Yes, perhaps

fweber144 avatar Aug 02 '21 09:08 fweber144

The problem indeed turned out to be L1 search (more precisely: collapse_contrasts_solution_path()). I'll push a fix asap. What I wanted to add here is that in my reprex above, the cv_varsel() call cannot work anyway (even with the poly() bug fixed) because it would throw

Error in poly(V2, 3) : 'degree' must be less than number of unique points

To make it work, either

my_cvvs_K <- cv_varsel(myfit, cv_method = "kfold", nclusters = 3, nclusters_pred = 5, seed = 34632)

or

myfit_raw <- stan_glm(y ~ V1 + poly(V2, 3, raw = TRUE) + V3 + V4 + V5,
                      data = mydat,
                      seed = 1140350788)
my_cvvs_raw <- cv_varsel(myfit_raw, nclusters = 3, nclusters_pred = 5, seed = 34632)

needs to be used.

Also note that apart from poly(), there is also polym().

fweber144 avatar May 01 '23 13:05 fweber144