tune
tune copied to clipboard
Regression running tune_bayes on mars with tune 1.1.2
With the latest version of tune
(1.1.2), using tune_bayes
on a mars
model raises lots of errors. No errors are raised with version 1.1.1. Here is a reprex, showing the errors with version 1.1.2.
library(tidymodels)
two_rec <- recipe(Class~.,data=two_class_dat)
mars_spec <- parsnip::mars() %>%
parsnip::set_engine("earth") %>%
parsnip::set_mode("classification")
mars_tune_spec <- parsnip::mars(num_terms=tune()) %>%
parsnip::set_engine("earth") %>%
parsnip::set_mode("classification")
two_tune_wkflow <-# new workflow object
workflow() %>% # use workflow function
add_recipe(two_rec) %>% # add the new recipe
add_model(mars_tune_spec)
two_cv <- vfold_cv(two_class_dat, v=3)
two_bayer_res <- tune_bayes(two_tune_wkflow,
resamples = two_cv, initial=8)
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> → A | error: `num_terms` should be >= 1.
#> There were issues with some computations A: x1
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more information.
#> no non-missing arguments to max; returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> There were issues with some computations A: x14
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more information.
#> no non-missing arguments to max; returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Warning in max(grid[[nm]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> ! No improvement for 10 iterations; returning current results.
#> There were issues with some computations A: x14There were issues with some computations A: x30
Created on 2023-09-15 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.1 (2023-06-16)
#> os Ubuntu 22.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_GB:en
#> collate en_GB.UTF-8
#> ctype en_GB.UTF-8
#> tz Europe/London
#> date 2023-09-15
#> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
#> broom * 1.0.5 2023-06-09 [1] CRAN (R 4.3.1)
#> class 7.3-22 2023-05-03 [3] CRAN (R 4.3.1)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
#> codetools 0.2-19 2023-02-01 [3] CRAN (R 4.2.2)
#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
#> data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.0)
#> dials * 1.2.0 2023-04-03 [1] CRAN (R 4.3.0)
#> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.3.0)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
#> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
#> earth * 5.3.2 2023-01-26 [1] CRAN (R 4.3.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.0)
#> Formula * 1.2-5 2023-02-24 [1] CRAN (R 4.3.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
#> furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.0)
#> future 1.33.0 2023-07-01 [1] CRAN (R 4.3.1)
#> future.apply 1.11.0 2023-05-21 [1] CRAN (R 4.3.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#> ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
#> globals 0.16.2 2022-11-21 [1] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
#> gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.0)
#> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.3.0)
#> gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
#> hardhat 1.3.0 2023-03-30 [1] CRAN (R 4.3.0)
#> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
#> infer * 1.0.4 2022-12-02 [1] CRAN (R 4.3.0)
#> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.0)
#> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.0)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
#> lattice 0.21-8 2023-04-05 [3] CRAN (R 4.3.0)
#> lava 1.7.2.1 2023-02-27 [1] CRAN (R 4.3.0)
#> lhs 1.1.6 2022-12-17 [1] CRAN (R 4.3.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
#> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.3.0)
#> lubridate 1.9.2 2023-02-10 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> MASS 7.3-60 2023-05-04 [3] CRAN (R 4.3.1)
#> Matrix 1.6-0 2023-07-08 [3] CRAN (R 4.3.1)
#> modeldata * 1.1.0 2023-01-25 [1] CRAN (R 4.3.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
#> nnet 7.3-19 2023-05-03 [3] CRAN (R 4.3.1)
#> parallelly 1.36.0 2023-05-26 [1] CRAN (R 4.3.0)
#> parsnip * 1.1.0 2023-04-12 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> plotmo * 3.6.2 2022-05-21 [1] CRAN (R 4.3.0)
#> plotrix * 3.8-2 2021-09-08 [1] CRAN (R 4.3.0)
#> prodlim 2023.03.31 2023-04-02 [1] CRAN (R 4.3.0)
#> purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)
#> recipes * 1.0.6 2023-04-25 [1] CRAN (R 4.3.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
#> rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1)
#> rpart 4.1.19 2022-10-21 [3] CRAN (R 4.2.1)
#> rsample * 1.2.0 2023-08-23 [1] CRAN (R 4.3.1)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
#> scales * 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.1)
#> survival 3.5-5 2023-03-12 [3] CRAN (R 4.3.1)
#> TeachingDemos * 2.12 2020-04-07 [1] CRAN (R 4.3.0)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> tidymodels * 1.1.0 2023-05-01 [1] CRAN (R 4.3.0)
#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
#> timeDate 4022.108 2023-01-07 [1] CRAN (R 4.3.0)
#> tune * 1.1.2 2023-08-23 [1] CRAN (R 4.3.1)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
#> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
#> workflows * 1.1.3 2023-02-22 [1] CRAN (R 4.3.0)
#> workflowsets * 1.0.1 2023-04-06 [1] CRAN (R 4.3.0)
#> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#> yardstick * 1.2.0 2023-04-21 [1] CRAN (R 4.3.0)
#>
#> [1] /home/andrea/R/x86_64-pc-linux-gnu-library
#> [2] /usr/lib/R/site-library
#> [3] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Thanks for the issue! This is a strange one.
With tune 1.1.1, I see:
two_bayer_res %>% collect_metrics()
#> # A tibble: 8 × 8
#> num_terms .metric .estimator mean n std_err .config .iter
#> <int> <chr> <chr> <dbl> <int> <dbl> <chr> <int>
#> 1 2 accuracy binary 0.784 3 0.0195 Preprocessor1_Model1 0
#> 2 2 roc_auc binary 0.866 3 0.0101 Preprocessor1_Model1 0
#> 3 3 accuracy binary 0.829 3 0.0122 Preprocessor1_Model2 0
#> 4 3 roc_auc binary 0.885 3 0.0105 Preprocessor1_Model2 0
#> 5 4 accuracy binary 0.822 3 0.0174 Preprocessor1_Model3 0
#> 6 4 roc_auc binary 0.883 3 0.0116 Preprocessor1_Model3 0
#> 7 5 accuracy binary 0.819 3 0.0150 Preprocessor1_Model4 0
#> 8 5 roc_auc binary 0.883 3 0.0110 Preprocessor1_Model4 0
Note that all .iter
are 0, i.e. the Bayesian search never started after the initial grid. I see the same output with 1.1.2.
Given the few changes that were made in 1.1.2, this seems to be an issue with our logging previously (resolved in https://github.com/tidymodels/tune/pull/682) rather than a newly introduced issue in tuning. Still need to troubleshoot what that newly surfaced issue is.
Ah, yes. With 1.1.2, if you set control = control_bayes(TRUE, TRUE)
, then you'll see the errors go away, as before.
The underlying issue as that the GP can't predict any possible new points. The default num_terms()
parameter object will only result in searches across integers in [2, 5]. That initial search covers all of those possible num_terms
values, so pred_gp()
returns early, noting that there were no more candidate models.
https://github.com/tidymodels/tune/blob/74854a59bcef48106ea691f7dc9e2efad71b566f/R/tune_bayes.R#L375-L380
https://github.com/tidymodels/tune/blob/74854a59bcef48106ea691f7dc9e2efad71b566f/R/tune_bayes.R#L584-L591
That message is passed to tune_log()
and then promptly ignored due to verbosity settings, resulting in that "num_terms
should be >= 1." error downstream (since the GP is passing on an NA for num_terms
). I think that logic was written before early returns from tune_bayes_workflow()
were properly caught and intermediate results returned and was maybe(?) implemented that way so that some results made it out of tune_bayes()
in that case. I'd argue we ought to stop optimization and exit early with a more informative error in this case.
I think I'll wait on addressing this in favor of a refactor of tune_log()
which should simplify the machinery for early exits.
Thank you @simonpcouch for looking into this. I have just checked, and the situation is the same in the original analysis that inspired the reprex
, so at least this is not a corner case due to the example dataset.
Waiting for a refactor of tune_log()
sounds sensible, it is now clear how to manage this issue in the mean time.