mlr3 icon indicating copy to clipboard operation
mlr3 copied to clipboard

Allow to define more stratum in `partition` function

Open Fred-Wu opened this issue 3 years ago • 4 comments

Splitting training and test/holdout data could be performed using partition function, but it only allows the stratification on the target variable.

However, I could do the splitting using , for example

task_gc = tsk("german_credit")
task_gc$col_roles$stratum = c("credit_risk", "housing", "telephone")
ho = rsmp("holdout", ratio = 0.8)
split = ho$instantiate(task_gc)
split$instance

Just wondering, can such funationality be brought to partition to define more stratum so that rsmp could be kept for its original purpose for resampling in the development on training data. I think they are essentially doing the same thing.

Fred-Wu avatar Mar 03 '22 14:03 Fred-Wu

@mllg Should I add this?

sebffischer avatar Apr 03 '22 10:04 sebffischer

This should already work:

task_gc = tsk("german_credit")
task_gc$col_roles$stratum = c("credit_risk", "housing", "telephone")
split = partition(task_gc, ratio = 0.8)

We should clearly document this better, though.

mllg avatar Apr 03 '22 10:04 mllg

This should already work:

task_gc = tsk("german_credit")
task_gc$col_roles$stratum = c("credit_risk", "housing", "telephone")
split = partition(task_gc, ratio = 0.8)

We should clearly document this better, though.

Thanks. This does work.

Fred-Wu avatar Apr 04 '22 09:04 Fred-Wu

Hello. What is the difference between partition() and rsmp()?

skanskan avatar Jun 28 '23 09:06 skanskan