mlr3pipelines icon indicating copy to clipboard operation
mlr3pipelines copied to clipboard

New param `use_groups` for `PipeOpSubsample`

Open advieser opened this issue 1 year ago • 1 comments

Closes https://github.com/mlr-org/mlr3pipelines/issues/567

If use_groups = TRUE (default), we subsample whole groups. This leads to frac not being fully accurate. We currently don't support stratification (stratify = TRUE) and subsampling grouped data at the same time, same as with Resamplings in mlr3. This changes the default behavior for tasks with a column with role "group".

Right now, task$row_roles$use is not respected when use_groups = TRUE. Question is, how we would want to handle that? If a group contains any row that is not in task$row_roles$use , we ignore the group for subsampling?

advieser avatar Oct 03 '24 18:10 advieser