recipes
recipes copied to clipboard
caret xgb na.action=na.pass does not work while using recipes
The same dataset with an NA value does not work with caret-recipes pipeline whereas the single caret does not cause any issues.
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data(cars)
cars$Mileage[100] <- NA
## Without recipes
train(Price ~.,
trControl = trainControl(
method = 'CV',
number = 3 #Reduced the number of CV-folds. Otherwise we would get a bunch of warnings
),
data = cars,
tuneLength = 1,
method = "xgbLinear",
objective = "reg:squarederror",
na.action = na.pass)
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> eXtreme Gradient Boosting
#>
#> 804 samples
#> 17 predictor
#>
#> No pre-processing
#> Resampling: Cross-Validated (3 fold)
#> Summary of sample sizes: 536, 536, 536
#> Resampling results:
#>
#> RMSE Rsquared MAE
#> 2442.675 0.9394766 1674.597
#>
#> Tuning parameter 'nrounds' was held constant at a value of 50
#> Tuning
#> 'alpha' was held constant at a value of 0
#> Tuning parameter 'eta' was
#> held constant at a value of 0.3
## With recipes
rec <- recipe(Price ~., data = cars)
train(rec,
data = cars,
trControl = trainControl(
method = 'CV',
number = 3
),
tuneLength = 1,
method = "xgbLinear",
objective = "reg:squarederror",
na.action = na.pass)
#>
#> Attaching package: 'xgboost'
#> The following object is masked from 'package:dplyr':
#>
#> slice
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> Warning: model fit failed for Fold1: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) :
#> cannot coerce type 'closure' to vector of type 'character'
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> Warning: model fit failed for Fold2: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) :
#> cannot coerce type 'closure' to vector of type 'character'
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#> objective
#> Only the last value for each of them will be used.
#> Warning: model fit failed for Fold3: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) :
#> cannot coerce type 'closure' to vector of type 'character'
#> Warning in train_rec(rec = x, dat = data, info = trainInfo, method = models, :
#> There were missing values in resampled performance measures.
#> Something is wrong; all the RMSE metric values are missing:
#> RMSE Rsquared MAE
#> Min. : NA Min. : NA Min. : NA
#> 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
#> Median : NA Median : NA Median : NA
#> Mean :NaN Mean :NaN Mean :NaN
#> 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
#> Max. : NA Max. : NA Max. : NA
#> NA's :1 NA's :1 NA's :1
#> Error: Stopping
Created on 2021-01-17 by the reprex package (v0.3.0)
Session info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reprex_0.3.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 rstudioapi_0.13 whisker_0.4 knitr_1.30 magrittr_2.0.1 bit_4.0.4 R6_2.5.0 rlang_0.4.10
[9] tools_4.0.3 xfun_0.19 coro_1.0.1 clipr_0.7.1 htmltools_0.5.0 ellipsis_0.3.1 bit64_4.0.5 digest_0.6.27
[17] tibble_3.0.4 lifecycle_0.2.0 torchvision_0.2.0 crayon_1.3.4 processx_3.4.5 torch_0.2.1.9000 callr_3.5.1 vctrs_0.3.5
[25] fs_1.5.0 ps_1.5.0 evaluate_0.14 rmarkdown_2.6 compiler_4.0.3 pillar_1.4.7 pkgconfig_2.0.3
Hello @LordRudolf, this feels like is more of a {caret} issue than a {recipes} issue.