tune
tune copied to clipboard
`tune_grid()` fails if parameter is not marked as `tunable()` despite matching grid
In #633, we enabled tune to deal with list-columns in the grid, in order to allow people to use a custom grid to tune lower-level arguments.
The example there was based on wanting to tune parameters to a custom loss function that was passed to the objective engine argument for xgboost. When applying the same approach to a recipe step, it fails:
library(tidymodels)
data("credit_data", package = "modeldata")
set.seed(342)
credit_resamples <- vfold_cv(credit_data, v = 5)
rec <- recipe(Price ~ ., data = credit_data) %>%
step_impute_bag(
Status, Home, Marital, Job, Income, Assets, Debt,
# instead of
# options = list(nbagg = tune())
# do
options = tune()
)
credit_wflow <- workflow(rec, linear_reg())
# make grid manually
grid_options <- tibble(nbagg = seq(15, 30, by = 5)) %>%
mutate(options = map(nbagg, ~ list(nbagg = .))) %>%
select(options)
credit_res <- tune_grid(credit_wflow, credit_resamples, grid = grid_options)
#> Warning: No tuning parameters have been detected, performance will be evaluated
#> using the resamples with no tuning. Did you want to [tune()] parameters?
#> → A | error: You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'options'. Do you want to use a tuning function such as `tune_grid()`?
#> There were issues with some computations A: x1
#> There were issues with some computations A: x5
#>
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
Created on 2023-04-04 with reprex v2.0.2
More details on why this fails are below but the higher-level question is: If you tag something for tuning and bring your own grid for it, shouldn't it work?
Currently, this traces back to tune_args() for recipes limiting the scope of checking for tune() tags to arguments which were designated for tuning via tunable().
https://github.com/tidymodels/recipes/blob/567fe17cddb2cb880bcd30daa7737effca065ed9/R/tune_args.R#L29-L32
We can bypass some previous checks by giving tune_grid() a decoy parameter set but because tune_args(), called here:
https://github.com/tidymodels/tune/blob/802909c7aa4c554b3890b7ff23a25bfd80e601dd/R/checks.R#L75
for this recipe is empty, this is where the journey currently ends.
tune_args(rec)
#> # A tibble: 0 × 6
#> # ℹ 6 variables: name <chr>, tunable <lgl>, id <chr>, source <chr>,
#> # component <chr>, component_id <chr>
fake_pset <- parameters(list(options = penalty()))
credit_res <- tune_grid(credit_wflow, credit_resamples,
grid = grid_options, param_info = fake_pset)
#> Error in `check_grid()`:
#> ! The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'options'.
#> Backtrace:
#> ▆
#> 1. ├─tune::tune_grid(...)
#> 2. └─tune:::tune_grid.workflow(...)
#> 3. └─tune:::tune_grid_workflow(...)
#> 4. └─tune:::check_grid(grid = grid, workflow = workflow, pset = pset)
#> 5. └─rlang::abort(msg)
Is this a matter of reworking the checks or the error messages? Or a bigger question of where do we accept tune() tags and how to deal with them?
The example is motivated by https://github.com/tidymodels/dials/issues/154
Wasn't sure whether to apply bug or feature, but either way, I agree that this ought to be fair game. :)
A bit more context from poking at this for a moment...
tune_args() methods are intended to return arguments marked for tuning, tunable() methods are intended to return arguments marked for tuning that we can associated dials parameter information with. In some places, tune_args() methods more closely resemble tunable() methods, making it difficult for tune to handle custom grids in a principled way. In theory, if a user provides their own grid, then we should be able to rely only on tune_args() methods when running tune_grid(). In that case, tune_grid() takes care of collecting and then injecting each needed values and recipes and/or parsnip never need to know they're handling tuning parameters.
So, step 1 is to disambiguate tune_args() and tunable() in implementations. :)
Just to give another example with step_holiday(), whose holidays argument is not tunable.
library(tidyverse)
library(tidymodels)
examples <- data.frame(someday = ymd("2000-12-20") + days(0:40))
holiday_rec <-
recipe(~someday, examples) %>%
step_holiday(all_predictors(), holidays = c("Easter", "ChristmasDay"))
There are 2^119 combinations of holidays (so make it tunable could be dangerous), but it would be nice if we could tune it based on a set of defined values, like this (as it's okay with list-columns now #633):
tibble(holidays = list(c("LaborDay", "NewYearsDay", "ChristmasDay"),
c("LaborDay", "NewYearsDay", "ChristmasDay", "Easter", "Annunciation"),
c("FRAllSaints", "FRBastilleDay", "FRAscension")))
# # A tibble: 3 × 1
# holidays
# <list>
# 1 <chr [3]>
# 2 <chr [5]>
# 3 <chr [3]>
Thanks!