mlr3proba
mlr3proba copied to clipboard
Handling competing risks in rfsrc/proba
Expected Behaviour
Benchmarking to complete for a competing risks model
Actual Behaviour
Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent
This error doesn't occur if I make sure the status variable is only 0 or 1.
It also doesn't occur if I just run learner$train(follic_task)
Reprex
library(mlr3verse)
#> Loading required package: mlr3
library(mlr3extralearners)
#>
#> Attaching package: 'mlr3extralearners'
#> The following objects are masked from 'package:mlr3':
#>
#> lrn, lrns
library(randomForestSRC)
#>
#> randomForestSRC 2.14.0
#>
#> Type rfsrc.news() to see new features, changes, and bug fixes.
#>
#>
#> Attaching package: 'randomForestSRC'
#> The following object is masked from 'package:mlr3verse':
#>
#> tune
data(follic, package = "randomForestSRC")
follic_task <- as_task_surv(
follic, event = "status", time = "time", type = "right"
)
learner = lrn("surv.rfsrc")
benchmark(
benchmark_grid(
tasks=list(follic_task), learners=list(learner), resamplings=rsmp("cv", folds=3)
)
)
#> INFO [13:58:06.277] [mlr3] Running benchmark with 3 resampling iterations
#> INFO [13:58:06.354] [mlr3] Applying learner 'surv.rfsrc' on task 'follic' (iter 1/3)
#> Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent
Created on 2021-11-22 by the reprex package (v2.0.1)
Session info
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS/LAPACK: /Users/funnellt/miniconda3/envs/mbml/lib/libopenblasp-r0.3.18.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] randomForestSRC_2.14.0 mlr3extralearners_0.5.15 mlr3verse_0.2.2
#> [4] mlr3_0.13.0
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.5.0 RColorBrewer_1.1-2 bbotk_0.4.0
#> [4] data.tree_1.0.0 mlr3proba_0.4.2 mlr3pipelines_0.4.0
#> [7] mlr3learners_0.5.0 tools_4.1.1 backports_1.3.0
#> [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.1
#> [13] colorspace_2.0-2 mlr3data_0.5.0 withr_2.4.2
#> [16] mlr3viz_0.5.7 mlr3misc_0.9.5 tidyselect_1.1.1
#> [19] compiler_4.1.1 cli_3.1.0 ooplah_0.1.0
#> [22] lgr_0.4.3 scales_1.1.1 checkmate_2.0.0
#> [25] palmerpenguins_0.1.0 mlr3tuning_0.9.0 stringr_1.4.0
#> [28] digest_0.6.28 rmarkdown_2.11 param6_0.2.3
#> [31] paradox_0.7.1 set6_0.2.3 pkgconfig_2.0.3
#> [34] htmltools_0.5.2 parallelly_1.28.1 fastmap_1.1.0
#> [37] highr_0.9 htmlwidgets_1.5.4 rlang_0.4.12
#> [40] visNetwork_2.1.0 generics_0.1.1 jsonlite_1.7.2
#> [43] dplyr_1.0.7 magrittr_2.0.1 Matrix_1.3-4
#> [46] Rcpp_1.0.7 mlr3fselect_0.6.0 munsell_0.5.0
#> [49] fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.5
#> [52] yaml_2.2.1 grid_4.1.1 parallel_4.1.1
#> [55] dictionar6_0.1.3 listenv_0.8.0 crayon_1.4.2
#> [58] lattice_0.20-45 splines_4.1.1 mlr3cluster_0.1.2
#> [61] knitr_1.35 pillar_1.6.4 mlr3filters_0.4.2
#> [64] uuid_1.0-3 future.apply_1.8.1 codetools_0.2-18
#> [67] reprex_2.0.1 glue_1.5.0 evaluate_0.14
#> [70] data.table_1.14.2 vctrs_0.3.8 distr6_1.6.2
#> [73] gtable_0.3.0 purrr_0.3.4 clue_0.3-60
#> [76] future_1.23.0 assertthat_0.2.1 ggplot2_3.3.5
#> [79] xfun_0.27 pracma_2.3.3 survival_3.2-13
#> [82] tibble_3.1.6 cluster_2.1.2 DiagrammeR_1.0.6.1
#> [85] globals_0.14.0 ellipsis_0.3.2 clusterCrit_1.2.8
This isn't a bug. mlr3proba doesn't currently support competing risks. When it does we will add this as a property for learners that handle it. For now the above behaviour is expected
If you can demonstrate the same problem with a non-competing risks task however then it may be a bug!
@RaphaelS1 would this issue be better placed in another repo, or closed? Also, are there immediate plans to support competing risk tasks? And if so, would that effort be something an MLR3 novice can easily contribute to?
@sebffischer can you transfer to mlr3proba?
@funnell whilst I usually love when someone volunteers to contribute anything, unfortunately this first requires an internal design decision about how the implementation looks. After that the coding implementation is relatively straightforward.
Any preliminary thoughts on this @adibender ?
The Surv
object does allow factor variables for the event in order to indicate CR/Multi-state outcomes. So specifying the task should be possible. How this is passed to the individual algorithms will be very heterogeneous, however, and not all algos will have customized methods like RFSRC. For those algos that don't have specialised methods, we would have to split the task in K tasks internally (one for each competing outcome) then fit the algos to each of them and aggregate the results afterwards and during evaluation... not sure if mlr3 was designed for this, but maybe through pipelines? Or how is multi-task classifcation handled for example?
We'll find time for a design meeting to discuss properly, not an easy answer... Pipelines seems wrong because it's too specialised
Discussed with Andreas, we should move this forward at some point