PipeOp "classweights" should throw a warning if it is attached to a learner which does not have the "weights" property
I reran code from a now accepted paper of mine on cost-sensitive ML for psychologists with mlr3 in which we used classweights in combination with "classif.log_reg". Recently, mlr3learners removed the "weights" property from "classif.log_reg" (which I think was a very bad decision, see my related issue in the mlr3learners github repository: mlr-org/mlr3learners#265). This made me aware of the following issue in mlr3pipelines:
If I build a pipeline combining a "classweights" pipeop with "classif.log_reg", this will silently ignore my weights (because classif.log_reg does not have the weights property anymore). This behavior is extremely dangerous, and I only noticed it because in my paper I have computed both conditions (with and without weighting) and both gave exactly the same results!
Would it be possible to include a warning when using classweights with learners without this property?
library("mlr3verse")
task = tsk("german_credit")
costs = matrix(c(0, 1, 5, 0), nrow = 2)
dimnames(costs) = list(response = c("good", "bad"), truth = c("good", "bad"))
print(costs)
logreg = lrn("classif.log_reg", predict_type = "prob")
tw = po("classweights", minor_weight = 5)
tw_logreg = as_learner(tw %>>% logreg)
# we get exactly the same benchmark result for both conditions (normal logreg and logreg with weighting)
# the fact that classif.log_reg does not have a weights property (since mlr3learners version 0.5.4) is silently ignored!
set.seed(1)
bm = benchmark(benchmark_grid(list(task), list(logreg, tw_logreg),
rsmp("cv")), store_models = FALSE)
bm$aggregate(msr("classif.costs", costs = costs))
# nr resample_result task_id learner_id resampling_id iters classif.costs
# 1: 1 <ResampleResult[21]> german_credit classif.log_reg cv 10 0.851
# 2: 2 <ResampleResult[21]> german_credit classweights.classif.log_reg cv 10 0.851
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.3
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] mlr3verse_0.2.7 mlr3_0.14.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 xfun_0.37 clusterCrit_1.2.8 listenv_0.9.0 mlr3cluster_0.1.6
[6] generics_0.1.3 colorspace_2.1-0 vctrs_0.5.2 htmltools_0.5.4 bbotk_0.7.2
[11] yaml_2.3.7 paradox_0.11.0 utf8_1.2.3 rlang_1.0.6 pillar_1.8.1
[16] glue_1.6.2 withr_2.5.0 palmerpenguins_0.1.1 uuid_1.1-0 mlr3fselect_0.9.1
[21] lifecycle_1.0.3 mlr3learners_0.5.6 munsell_0.5.0 gtable_0.3.1 future_1.31.0
[26] codetools_0.2-18 evaluate_0.20 mlr3data_0.6.1 knitr_1.42 fastmap_1.1.0
[31] parallel_4.2.2 fansi_1.0.4 mlr3tuningspaces_0.3.3 renv_0.16.0 backports_1.4.1
[36] scales_1.2.1 checkmate_2.1.0 mlr3filters_0.7.1 mlr3viz_0.6.1 mlr3tuning_0.17.2
[41] parallelly_1.34.0 ggplot2_3.4.1 digest_0.6.31 dplyr_1.1.0 grid_4.2.2
[46] clue_0.3-64 cli_3.6.0 tools_4.2.2 magrittr_2.0.3 tibble_3.1.8
[51] cluster_2.1.4 mlr3misc_0.11.0 future.apply_1.10.0 pkgconfig_2.0.3 data.table_1.14.8
[56] mlr3pipelines_0.4.2 rmarkdown_2.20 rstudioapi_0.14 lgr_0.4.4 R6_2.5.1
[61] globals_0.16.2 compiler_4.2.2
An mlr3pipelines PipeOp does not know about what other operations happen before / after it. In this case, it may be worth considering whether an mlr3 Learner should throw an error when it is given a task with weights but without the weights property.