tune
tune copied to clipboard
Hard-to-understand error message with CV and missing data
I've found that cross validation fails with a difficult-to-understand error message (see figure for example) when the predictor variables contain missing entries. I think it's right to throw an error here, but the message should be tweaked to say something about missingness (right now it just complains about various row sizes).
Thanks!
Minimal reproducible example:
mtcars$carb[mtcars$carb == 4] <- NA
spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 5) |>
set_engine("kknn") |>
set_mode("regression")
rec <- recipe(mpg ~ disp + carb, data = mtcars)
vfold <- vfold_cv(mtcars, v = 10)
workflow() |>
add_recipe(rec) |>
add_model(spec) |>
fit_resamples(resamples = vfold) |>
collect_metrics()

Hello @trevorcampbell Thanks for the issue! I'm moving it to {tune} since that is where the issue lies. The error is happening in tune:::predict_model() when predict() is called. Since predict() returns fewer rows than expected, we get this error.
It should be fixed to be more informative for the user!
Thanks @EmilHvitfeldt -- I figured I might have pinged the wrong group ;)
No problem! We much rather have an useful issue posted in the wrong repository and move it, than not have the issue posted at all :)
Related to tidymodels/parsnip#812, though I agree that they're unique and ought to live where they do.🌻 Different engines fail to different degrees here, so fixes may need to live in multiple places depending on the level at which we address this:
library(tidymodels)
mtcars$carb[mtcars$carb == 4] <- NA
rec <- recipe(mpg ~ disp + carb, data = mtcars)
set.seed(1)
vfold <- vfold_cv(mtcars, v = 10)
workflow() |>
add_recipe(rec) |>
add_model(nearest_neighbor(mode = "regression")) |>
fit_resamples(resamples = vfold) |>
collect_metrics()
#> x Fold01: preprocessor 1/1, model 1/1 (predictions):
#> Error in `mutate()`:
#> ! Problem while computing `.row = orig_rows`.
#> ✖ `.row` must be size 2 or 1, not 4.
#> x Fold04: preprocessor 1/1, model 1/1 (predictions):
#> Error in `mutate()`:
#> ! Problem while computing `.row = orig_rows`.
#> ✖ `.row` must be size 1, not 3.
#> x Fold05: preprocessor 1/1, model 1/1 (predictions):
#> Error in `mutate()`:
#> ! Problem while computing `.row = orig_rows`.
#> ✖ `.row` must be size 2 or 1, not 3.
#> x Fold09: preprocessor 1/1, model 1/1 (predictions):
#> Error in `mutate()`:
#> ! Problem while computing `.row = orig_rows`.
#> ✖ `.row` must be size 0 or 1, not 3.
#> x Fold10: preprocessor 1/1, model 1/1 (predictions):
#> Error in `mutate()`:
#> ! Problem while computing `.row = orig_rows`.
#> ✖ `.row` must be size 1, not 3.
#> # A tibble: 2 × 6
#> .metric .estimator mean n std_err .config
#> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 rmse standard 3.20 5 0.399 Preprocessor1_Model1
#> 2 rsq standard 0.700 5 0.189 Preprocessor1_Model1
workflow() |>
add_recipe(rec) |>
add_model(decision_tree(mode = "regression")) |>
fit_resamples(resamples = vfold) |>
collect_metrics()
#> ! Fold01: internal: A correlation computation is required, but `estimate` is constant and ha...
#> ! Fold05: internal: A correlation computation is required, but `estimate` is constant and ha...
#> ! Fold07: internal: A correlation computation is required, but `estimate` is constant and ha...
#> ! Fold10: internal: A correlation computation is required, but `estimate` is constant and ha...
#> # A tibble: 2 × 6
#> .metric .estimator mean n std_err .config
#> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 rmse standard 4.07 10 0.410 Preprocessor1_Model1
#> 2 rsq standard 0.866 6 0.0585 Preprocessor1_Model1
Created on 2022-12-13 with reprex v2.0.2