recipes icon indicating copy to clipboard operation
recipes copied to clipboard

caret xgb na.action=na.pass does not work while using recipes

Open LordRudolf opened this issue 4 years ago • 1 comments

The same dataset with an NA value does not work with caret-recipes pipeline whereas the single caret does not cause any issues.

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

data(cars)
cars$Mileage[100] <- NA
## Without recipes 
train(Price ~., 
      trControl = trainControl(
        method = 'CV',
        number = 3 #Reduced the number of CV-folds. Otherwise we would get a bunch of warnings
      ),
      data = cars,
      tuneLength = 1,
      method = "xgbLinear",
      objective = "reg:squarederror",
      na.action = na.pass)
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.

#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.

#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.

#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> eXtreme Gradient Boosting 
#> 
#> 804 samples
#>  17 predictor
#> 
#> No pre-processing
#> Resampling: Cross-Validated (3 fold) 
#> Summary of sample sizes: 536, 536, 536 
#> Resampling results:
#> 
#>   RMSE      Rsquared   MAE     
#>   2442.675  0.9394766  1674.597
#> 
#> Tuning parameter 'nrounds' was held constant at a value of 50
#> Tuning
#>  'alpha' was held constant at a value of 0
#> Tuning parameter 'eta' was
#>  held constant at a value of 0.3

## With recipes
rec <- recipe(Price ~., data = cars)
train(rec,
      data = cars,
      trControl = trainControl(
        method = 'CV',
        number = 3
      ),
      tuneLength = 1,
      method = "xgbLinear",
      objective = "reg:squarederror",
      na.action = na.pass)
#> 
#> Attaching package: 'xgboost'
#> The following object is masked from 'package:dplyr':
#> 
#>     slice
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> Warning: model fit failed for Fold1: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) : 
#>   cannot coerce type 'closure' to vector of type 'character'
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> Warning: model fit failed for Fold2: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) : 
#>   cannot coerce type 'closure' to vector of type 'character'
#> Warning in check.booster.params(params, ...): The following parameters were provided multiple times:
#>  objective
#>   Only the last value for each of them will be used.
#> Warning: model fit failed for Fold3: lambda=0, alpha=0, nrounds=50, eta=0.3 Error in as.character(x) : 
#>   cannot coerce type 'closure' to vector of type 'character'
#> Warning in train_rec(rec = x, dat = data, info = trainInfo, method = models, :
#> There were missing values in resampled performance measures.
#> Something is wrong; all the RMSE metric values are missing:
#>       RMSE        Rsquared        MAE     
#>  Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   : NA   Max.   : NA   Max.   : NA  
#>  NA's   :1     NA's   :1     NA's   :1
#> Error: Stopping

Created on 2021-01-17 by the reprex package (v0.3.0)

Session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reprex_0.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        rstudioapi_0.13   whisker_0.4       knitr_1.30        magrittr_2.0.1    bit_4.0.4         R6_2.5.0          rlang_0.4.10     
 [9] tools_4.0.3       xfun_0.19         coro_1.0.1        clipr_0.7.1       htmltools_0.5.0   ellipsis_0.3.1    bit64_4.0.5       digest_0.6.27    
[17] tibble_3.0.4      lifecycle_0.2.0   torchvision_0.2.0 crayon_1.3.4      processx_3.4.5    torch_0.2.1.9000  callr_3.5.1       vctrs_0.3.5      
[25] fs_1.5.0          ps_1.5.0          evaluate_0.14     rmarkdown_2.6     compiler_4.0.3    pillar_1.4.7      pkgconfig_2.0.3  

LordRudolf avatar Jan 17 '21 19:01 LordRudolf

Hello @LordRudolf, this feels like is more of a {caret} issue than a {recipes} issue.

EmilHvitfeldt avatar Mar 30 '23 21:03 EmilHvitfeldt