tune icon indicating copy to clipboard operation
tune copied to clipboard

Option to run grid_tune() and other long-running tidymodels processes as a RStudio or Workbench Job

Open jthomasmock opened this issue 2 years ago • 0 comments
trafficstars

Feature

In many situations tidymodels is primarily interactive data analysis and model fitting, where the user will in real time explore the data, preprocess, choose a model, bundle it into a workflow, and then start training/fitting the model.

However, grid tuning, cross-validation, etc are likely:

  • Long running, blocking the console/session until completion
  • Benefit greatly from parallelization
  • Largely non-interactive, in that you are stuck waiting for everything to finish

RStudio Jobs provide a way to start a background R process, execute a script, and then return the results back into memory as-is, into a results object in memory, or save out to disk.

There is optional support for sending the current global environment into this background process, so you could move from interactive to automated and back if needed, or define the entire script/pipeline, including pre-processing, workflow building, model selection, tuning/resampling, fitting, and evaluating the model.

The secondary benefit is that moving long-running, intensive work into the background doesn't lock up your console. You could do lightweight EDA, plotting, setting up additional model workflows, etc all while your tuning/training is occurring in parallel (both in the sense of multicores/system time and in human hands on keyboard time).

Thus, my feature request would be having an optional argument to tune_grid(), fit_resamples(), control_grid(), etc that allows for running the parallel or generally long-running processes as a background job in RStudio.

For examples of existing implementations of this, guildai has guild_run() for running a script and defaults to running as a background job, and the option can be turned on/off. https://guildai.github.io/guildai-r/reference/guild_run.html

Shiny apps have first class support for running as a background job in RStudio, via the Run App > In Background Job button.

image

There is also an option to run a Plumber API or Vetiver Model as a background job, allowing for interactive testing of the model endpoint prior to deploying to production.

I also have some examples of manually running model training/cross validation as a Background Job, where the control = control_grid(verbose = TRUE) option is very useful for tracking progress!

image

jthomasmock avatar Mar 23 '23 16:03 jthomasmock