parsnip icon indicating copy to clipboard operation
parsnip copied to clipboard

Feature Request: ability to pass dataframe to `validation` argument of `xgboost`

Open joeycouse opened this issue 2 years ago • 2 comments

Related to #760

Current implementation of the validation parameter in boost_tree is to only set the proportion of training data to use as the validation set. It would be great to have the ability to pass a dataframe as an argument to validation as well. This would be really helpful if there is a grouping structure within the data and you want to test if the model generalizes to difference groups, and would align the parsnip capabilities to match xbg.train()

Not a great example but just for demonstration

library(modeldata)

data("penguins")

train <- 
  penguins |> 
  filter(species %in% c("Gentoo", "Adelie"))

valid <-
  penguins |> 
  filter(!(species %in% c("Gentoo", "Adelie")))


boost_tree(mode = 'regression',
           mtry = 3,
           tree_depth = 2,
           stop_iter = 5) |> 
  set_engine(validation = valid)


joeycouse avatar Jul 12 '22 15:07 joeycouse

Thanks for the issue! This is an interesting idea and one that we ought to consider. xgboost and lightgbm's interfaces for validation sets allow for a lot of user control, but we'd need to think carefully about what a tidymodels-esque interface might feel like here.

This won't be on the top of our to-do list for now, but will leave this open as a possible future extension. :)

simonpcouch avatar Jul 13 '22 13:07 simonpcouch

Additional use case at https://community.rstudio.com/t/passing-other-recipe-roles-into-model-function/154793.

simonpcouch avatar Dec 12 '22 13:12 simonpcouch