parsnip
parsnip copied to clipboard
Feature Request: ability to pass dataframe to `validation` argument of `xgboost`
Related to #760
Current implementation of the validation
parameter in boost_tree
is to only set the proportion of training data to use as the validation set. It would be great to have the ability to pass a dataframe as an argument to validation
as well. This would be really helpful if there is a grouping structure within the data and you want to test if the model generalizes to difference groups, and would align the parsnip
capabilities to match xbg.train()
Not a great example but just for demonstration
library(modeldata)
data("penguins")
train <-
penguins |>
filter(species %in% c("Gentoo", "Adelie"))
valid <-
penguins |>
filter(!(species %in% c("Gentoo", "Adelie")))
boost_tree(mode = 'regression',
mtry = 3,
tree_depth = 2,
stop_iter = 5) |>
set_engine(validation = valid)
Thanks for the issue! This is an interesting idea and one that we ought to consider. xgboost and lightgbm's interfaces for validation sets allow for a lot of user control, but we'd need to think carefully about what a tidymodels-esque interface might feel like here.
This won't be on the top of our to-do list for now, but will leave this open as a possible future extension. :)
Additional use case at https://community.rstudio.com/t/passing-other-recipe-roles-into-model-function/154793.