caret
caret copied to clipboard
BUG: preProcess creates some problems
I noticed an error when I try to scale my data before training.
Minimal, reproducible example:
Minimal dataset:
#inpdat
structure(list(A = c(250.397844444444, 118.723101730769, 198.102258935361,
72.0243680555556, 127.334116930023, 113.544560833333, 67.4810288322922,
79.3384135865595, 90.612652173913, 151.005990983607, 54.3355680990532,
43.3958625835189, 51.3791770580297, 51.8443565488566, 55.1642085339168,
43.2120597040905, 194.876755664336, 47.4177031593407, 51.2903940594059,
45.3199273784355), B = c(584.038690358063, 1183.06987874036,
415.866468194858, 1388.7562443277, 1098.47736190407, 994.704031017159,
1304.56725198883, 868.650798550627, 475.430093787274, 918.749907568257,
651.154773281148, 688.563525926502, 346.074441144205, 462.572988323804,
451.159845489153, 430.285952661533, 59.7743108216124, 593.713031627798,
743.354471747694, 357.029809600858), y = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("O", "P"), class = "factor")), row.names = c("10042",
"10059", "10074", "10083", "10098", "10099", "10114", "10116",
"10124", "10137", "23738", "23739", "23740", "23741", "23742",
"23744", "23746", "23748", "23749", "23750"), class = "data.frame")
Minimal, runnable code:
library(caret)
library(dplyr)
#inpdat - data from above
sampler <- function(x, y) {
n <- 5
sampled <- as.data.frame(x) %>%
mutate(.y = as.vector(y)) %>%
group_by(.y) %>%
sample_n(n)
list(x = x[, names(x) != ".y", drop = FALSE], y = y)
}
samp_info <- list(name = sampler, first = TRUE)
fitctrl <- trainControl(method = "repeatedcv",
number = 10, repeats = 5,
sampling = sampler)
lr_mod <- train(y ~ ., data = inpdat,
preProcess = c('scale', 'center'),
method = "glm",
trControl = fitctrl)
but if you run it without preProcess
:
lr_mod <- train(y ~ ., data = inpdat,
preProcess = c('scale', 'center'),
method = "glm",
trControl = fitctrl)
it works fine. Is there something I do wrong or that's a bug?
The warning says:
model fit failed for Fold10.Rep5: parameter=none Error in get_types(x) : `x` must have column names
which is not very informative for me.
Session Info:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] caret_6.0-90 lattice_0.20-44 ggplot2_3.3.5
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 purrr_0.3.4 reshape2_1.4.4 listenv_0.8.0
[5] splines_4.1.0 colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0
[9] stats4_4.1.0 utf8_1.2.2 survival_3.2-11 prodlim_2019.11.13
[13] rlang_0.4.11 ModelMetrics_1.2.2.2 pillar_1.6.3 glue_1.4.2
[17] withr_2.4.2 DBI_1.1.1 foreach_1.5.1 lifecycle_1.0.1
[21] plyr_1.8.6 lava_1.6.10 stringr_1.4.0 timeDate_3043.102
[25] munsell_0.5.0 gtable_0.3.0 future_1.22.1 recipes_0.1.17
[29] codetools_0.2-18 parallel_4.1.0 class_7.3-19 fansi_0.5.0
[33] Rcpp_1.0.7 scales_1.1.1 ipred_0.9-12 parallelly_1.26.1
[37] digest_0.6.28 stringi_1.7.5 dplyr_1.0.7 grid_4.1.0
[41] tools_4.1.0 magrittr_2.0.1 tibble_3.1.5 crayon_1.4.1
[45] future.apply_1.8.1 pkgconfig_2.0.3 ellipsis_0.3.2 MASS_7.3-54
[49] Matrix_1.3-3 data.table_1.14.0 pROC_1.18.0 lubridate_1.7.10
[53] gower_0.2.2 assertthat_0.2.1 iterators_1.0.13 R6_2.5.1
[57] globals_0.14.0 rpart_4.1-15 nnet_7.3-16 nlme_3.1-152
[61] compiler_4.1.0