dummyVars ignores sep argument for character variables
Hey,
I spotted a potential issue with dummyVars: When a character variable (instead of a factor variable) is included in the RHS of the formula the sep argument is ignored for these variables (ie the separators between variable name and levels are not inserted).
It's a minor issue but took me some time to figure out the reason. Is this expected behaviour? If not I'm happy to invest some time into finding a fix.
Cheers
Minimal, reproducible example:
library(earth)
library(tidyverse)
library(magrittr)
library(caret)
data(etitanic)
# this works fine giving the correct separators between variable name and level
dummies <- dummyVars(survived ~ ., data = etitanic,sep=".")
head(predict(dummies, newdata = etitanic))
#> pclass.1st pclass.2nd pclass.3rd sex.female sex.male age sibsp parch
#> 1 1 0 0 1 0 29.0000 0 0
#> 2 1 0 0 0 1 0.9167 1 2
#> 3 1 0 0 1 0 2.0000 1 2
#> 4 1 0 0 0 1 30.0000 1 2
#> 5 1 0 0 1 0 25.0000 1 2
#> 6 1 0 0 0 1 48.0000 0 0
# after converting a variable to character dummyVars fails to insert the separator character
etitanic %<>% mutate(pclass = as.character(pclass))
dummies <- dummyVars(survived ~ ., data = etitanic,sep=".")
head(predict(dummies, newdata = etitanic))
#> pclass1st pclass2nd pclass3rd sex.female sex.male age sibsp parch
#> 1 1 0 0 1 0 29.0000 0 0
#> 2 1 0 0 0 1 0.9167 1 2
#> 3 1 0 0 1 0 2.0000 1 2
#> 4 1 0 0 0 1 30.0000 1 2
#> 5 1 0 0 1 0 25.0000 1 2
#> 6 1 0 0 0 1 48.0000 0 0
Session Info:
>sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 caret_6.0-81 lattice_0.20-35 magrittr_1.5 forcats_0.3.0 stringr_1.3.1
[7] dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0
[13] tidyverse_1.2.1 earth_4.7.0 plotmo_3.5.2 TeachingDemos_2.10 plotrix_3.7-4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 lubridate_1.7.4 class_7.3-14 assertthat_0.2.0 ipred_0.9-7 foreach_1.4.4
[7] R6_2.2.2 cellranger_1.1.0 plyr_1.8.4 backports_1.1.2 stats4_3.5.1 httr_1.3.1
[13] pillar_1.3.0 rlang_0.3.1 lazyeval_0.2.1 readxl_1.1.0 rstudioapi_0.7 data.table_1.11.4
[19] rpart_4.1-13 Matrix_1.2-14 splines_3.5.1 gower_0.1.2 munsell_0.5.0 broom_0.5.0
[25] compiler_3.5.1 modelr_0.1.2 pkgconfig_2.0.1 nnet_7.3-12 tidyselect_0.2.4 prodlim_2018.04.18
[31] codetools_0.2-15 crayon_1.3.4 withr_2.1.2 MASS_7.3-50 recipes_0.1.4 ModelMetrics_1.2.0
[37] grid_3.5.1 nlme_3.1-137 jsonlite_1.6 gtable_0.2.0 scales_1.0.0 cli_1.0.0
[43] stringi_1.2.4 reshape2_1.4.3 timeDate_3043.102 xml2_1.2.0 generics_0.0.2 lava_1.6.3
[49] iterators_1.0.10 tools_3.5.1 glue_1.3.0 hms_0.4.2 survival_2.42-3 yaml_2.2.0
[55] colorspace_1.3-2 rvest_0.3.2 bindr_0.1.1 haven_1.1.2
I'll try to add a MRE to this as well but seeing same behavior for the fullRank and levelsOnly arguments as well, i.e. dummyVars seems to ignore them if the RHS has character columns. Thanks to OP for calling this behavior out!
This is still broken.