caret icon indicating copy to clipboard operation
caret copied to clipboard

BUG: preProcess creates some problems

Open dokato opened this issue 2 years ago • 0 comments

I noticed an error when I try to scale my data before training.

Minimal, reproducible example:

Minimal dataset:

#inpdat
structure(list(A = c(250.397844444444, 118.723101730769, 198.102258935361, 
72.0243680555556, 127.334116930023, 113.544560833333, 67.4810288322922, 
79.3384135865595, 90.612652173913, 151.005990983607, 54.3355680990532, 
43.3958625835189, 51.3791770580297, 51.8443565488566, 55.1642085339168, 
43.2120597040905, 194.876755664336, 47.4177031593407, 51.2903940594059, 
45.3199273784355), B = c(584.038690358063, 1183.06987874036, 
415.866468194858, 1388.7562443277, 1098.47736190407, 994.704031017159, 
1304.56725198883, 868.650798550627, 475.430093787274, 918.749907568257, 
651.154773281148, 688.563525926502, 346.074441144205, 462.572988323804, 
451.159845489153, 430.285952661533, 59.7743108216124, 593.713031627798, 
743.354471747694, 357.029809600858), y = structure(c(2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = c("O", "P"), class = "factor")), row.names = c("10042", 
"10059", "10074", "10083", "10098", "10099", "10114", "10116", 
"10124", "10137", "23738", "23739", "23740", "23741", "23742", 
"23744", "23746", "23748", "23749", "23750"), class = "data.frame")

Minimal, runnable code:

library(caret)
library(dplyr)

#inpdat - data from above

sampler <- function(x, y) {
  n <- 5
  sampled <- as.data.frame(x) %>%
    mutate(.y = as.vector(y)) %>%
    group_by(.y) %>%
    sample_n(n)
  list(x = x[, names(x) != ".y", drop = FALSE], y = y)
}

samp_info <- list(name = sampler, first = TRUE)

fitctrl <- trainControl(method = "repeatedcv",
                        number = 10, repeats = 5,
                        sampling = sampler)
lr_mod <- train(y ~ ., data = inpdat,
                preProcess = c('scale', 'center'),
                method = "glm",
                trControl = fitctrl)

but if you run it without preProcess:

lr_mod <- train(y ~ ., data = inpdat,
                preProcess = c('scale', 'center'),
                method = "glm",
                trControl = fitctrl)

it works fine. Is there something I do wrong or that's a bug?

The warning says:

model fit failed for Fold10.Rep5: parameter=none Error in get_types(x) : `x` must have column names

which is not very informative for me.

Session Info:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-90    lattice_0.20-44 ggplot2_3.3.5  

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1     purrr_0.3.4          reshape2_1.4.4       listenv_0.8.0       
 [5] splines_4.1.0        colorspace_2.0-2     vctrs_0.3.8          generics_0.1.0      
 [9] stats4_4.1.0         utf8_1.2.2           survival_3.2-11      prodlim_2019.11.13  
[13] rlang_0.4.11         ModelMetrics_1.2.2.2 pillar_1.6.3         glue_1.4.2          
[17] withr_2.4.2          DBI_1.1.1            foreach_1.5.1        lifecycle_1.0.1     
[21] plyr_1.8.6           lava_1.6.10          stringr_1.4.0        timeDate_3043.102   
[25] munsell_0.5.0        gtable_0.3.0         future_1.22.1        recipes_0.1.17      
[29] codetools_0.2-18     parallel_4.1.0       class_7.3-19         fansi_0.5.0         
[33] Rcpp_1.0.7           scales_1.1.1         ipred_0.9-12         parallelly_1.26.1   
[37] digest_0.6.28        stringi_1.7.5        dplyr_1.0.7          grid_4.1.0          
[41] tools_4.1.0          magrittr_2.0.1       tibble_3.1.5         crayon_1.4.1        
[45] future.apply_1.8.1   pkgconfig_2.0.3      ellipsis_0.3.2       MASS_7.3-54         
[49] Matrix_1.3-3         data.table_1.14.0    pROC_1.18.0          lubridate_1.7.10    
[53] gower_0.2.2          assertthat_0.2.1     iterators_1.0.13     R6_2.5.1            
[57] globals_0.14.0       rpart_4.1-15         nnet_7.3-16          nlme_3.1-152        
[61] compiler_4.1.0  

dokato avatar Oct 13 '21 15:10 dokato