mlr3pipelines
mlr3pipelines copied to clipboard
New param `use_groups` for `PipeOpSubsample`
Closes https://github.com/mlr-org/mlr3pipelines/issues/567
If use_groups = TRUE (default), we subsample whole groups. This leads to frac not being fully accurate.
We currently don't support stratification (stratify = TRUE) and subsampling grouped data at the same time, same as with Resamplings in mlr3.
This changes the default behavior for tasks with a column with role "group".
Right now, task$row_roles$use is not respected when use_groups = TRUE. Question is, how we would want to handle that? If a group contains any row that is not in task$row_roles$use , we ignore the group for subsampling?