mlr3proba icon indicating copy to clipboard operation
mlr3proba copied to clipboard

time-dependent ROC AUC issues

Open bblodfon opened this issue 3 years ago • 8 comments

Hi! I know that time-dependent ROC AUC is maybe a touchy subject and I have seen that you want to implement your own versions but I still found some probable issues worth looking at:

library("mlr3verse")
#> Loading required package: mlr3

task = tsk("rats")
task$select(c("litter", "rx"))
learner_lp = lrn("surv.glmnet")
prediction_lp = learner_lp$train(task, row_ids = 1:280)$predict(task, row_ids = 281:300)
#> Warning: Multiple lambdas have been fit. Lambda will be set to 0.01 (see
#> parameter 's').

measure_uno = msr('surv.uno_auc')
prediction_lp$score(measures = measure_uno, task = task, train_set = 1:280)
# if no `times` not given, it's the sorted, unique test times
# https://github.com/mlr-org/mlr3proba/blob/main/R/MeasureSurvAUC.R#L52
#> surv.uno_auc 
#>    0.5007141

# now we supply our own `times` and let the package deal with them 
measure_uno$param_set$values = list(integrated = TRUE, times = seq(100, 10))
prediction_lp$score(measures = measure_uno, task = task, train_set = 1:280)
#> surv.uno_auc 
#>    0.2537478

# with `intergated = FALSE`, `times` is required as a scalar? Why it is implemented like this?
# I would like to get individual AUCs for each specified time point since it's not an integrated measure!
# and still it fails for that specific timepoint (I guess the package's fault?)
measure_uno$param_set$values = list(integrated = FALSE, times = 100) 
prediction_lp$score(measures = measure_uno, task = task, train_set = 1:280)
#> Error in vapply(.x, .f, FUN.VALUE = .value, USE.NAMES = FALSE, ...): values must be length 1,
#>  but FUN(X[[1]]) result is length 3

measure_hung = msr('surv.hung_auc') # too high AUC? (I guess problem with the package)
prediction_lp$score(measures = measure_hung, task = task, train_set = 1:280)
#> surv.hung_auc 
#>      57038.75

# Issue: Song's AUC requires `lp` but it's not there since `learner_lp$model$linear.predictors` is NULL, see:
# https://github.com/mlr-org/mlr3proba/blob/main/R/MeasureSurvAUC.R#L47
# Maybe this: `learner$predict(task, row_ids = train_set)$lp` is what is needed?
# The vector of predictors estimated from the training data (consulted the `AUC.sh` documentation)
measure_song = msr('surv.song_auc')
prediction_lp$score(measures = measure_song, task = task, train_set = 1:280, learner = learner_lp)
#> Error in FUN(lpnew = prediction$lp, Surv.rsp = structure(c(101, 49, 104, : argument "lp" is missing, with no default

# Same Issue
measure_cham = msr('surv.chambless_auc')
prediction_lp$score(measures = measure_cham, task = task, train_set = 1:280, learner = learner_lp)
#> Error in FUN(lpnew = prediction$lp, Surv.rsp = structure(c(101, 49, 104, : argument "lp" is missing, with no default

Created on 2022-04-01 by the reprex package (v2.0.1)


A more general question I have is whether these AUC ROC methods are applicable in a surv.glmnet model (and other survival learners) or only in a surv.coxph model.? I see that 2 of them assume "Cox PH model specification" and 2 of them "random censoring'. Without having read the papers from these methods, does it make sense to use the last 2 (Uno's and Hung's) that only assume random censoring (which I am willing to accept) in any type of survival learner in this package? Or there are other things I should watch out for before using these methods?

bblodfon avatar Apr 01 '22 16:04 bblodfon

I know that time-dependent ROC AUC is maybe a touchy subject

What do you mean?

I have seen that you want to implement your own versions

I do, assuming survAUC is still buggy

A more general question I have is whether these AUC ROC methods are applicable in a surv.glmnet model (and other survival learners) or only in a surv.coxph model.? I see that 2 of them assume "Cox PH model specification" and 2 of them "random censoring'. Without having read the papers from these methods, does it make sense to use the last 2 (Uno's and Hung's) that only assume random censoring (which I am willing to accept) in any type of survival learner in this package? Or there are other things I should watch out for before using these methods?

There are three primary assumptions depending on AUC: 1) Cox model; 2) random censoring; 3) linear predictor is somehow proportionally related to the predicted distribution/survival time (which basically amounts to Cox).

So I believe glmnet should be okay because it still has the same linear model form as the Cox (though with the regularisation, unsure how this affects it).

I don't really have a good intuition for these unfortunately!

Fixed the uno AUC bug here and made the Cox PH requirements more explicit.

RaphaelS1 avatar Apr 01 '22 21:04 RaphaelS1

Hi Raphael,

Thanks for your speedy response! Regarding the "touchy subject" definitely wrong phrasing - more like a confusing matter to me since I've heard different opinions about R packages to use to get time-dependent AUCs depending on the model and the times parameter you use, etc. For example what if I want the AUC at a timepoint after the last timepoint in the training survival data - survAUC returns 0 but I don't know if this is correct always. Will do some more investigation to sort these out and maybe come with suggestions for other ways to compute AUCs if that's on the table :)

I quickly checked your commits and I have two more things I would kindly ask your opinion:

  1. The output of Hang's AUC is definitely wrong (AUC should be between 0 and 1) but has to do with the survAUC package itself, right?
  2. From what I've gathered, you always return one AUC, either the integrated one for a provided times vector and integrated = TRUE or the AUC corresponding to one specific timepoint (scalar times and integrated = FALSE). Why not allow the full result to be returned ($auc) as seen below? Is it a matter of design?
library(mlr3verse)
#> Loading required package: mlr3
library(survAUC)
#> Loading required package: survival

task = tsk("rats")
task$select(c("litter", "rx"))
learner_lp = lrn("surv.glmnet")
learner_lp = lrn("surv.coxph")
prediction_lp = learner_lp$train(task, row_ids = 1:280)$predict(task, row_ids = 281:300)

Surv.rsp = task$truth(1:280)
Surv.rsp.new = prediction_lp$truth
lpnew = prediction_lp$lp
times = c(1,10,50,100,120)
survAUC::AUC.uno(Surv.rsp, Surv.rsp.new, lpnew, times)
#> $auc
#> [1] 0.0500000 0.0500000 0.0500000 0.2537478 0.0000000
#> 
#> $times
#> [1]   1  10  50 100 120
#> 
#> $iauc
#> [1] 0.1510404
#> 
#> attr(,"class")
#> [1] "survAUC"

Created on 2022-04-02 by the reprex package (v2.0.1)

bblodfon avatar Apr 02 '22 10:04 bblodfon

I don't disagree about touchy subject point. There's like two decades of argument in the literature about what AUC to use and when and how. And implementations across packages are not great (as you've seen with survAUC).

From what I remember they are quite tricky to implement which is why it's low priority for me but also I think likely why no one else has tried and why survAUC has never been patched.

And yes it's a design point. All measures need to be compatible with automated ML methods such as tuning, so returning a vector or list wouldn't be compatible with this. However, we could make use of R6 a bit better and let measures be constructed like learners and have a slot like $model where we store the full output from the measures. I see no reason why I couldn't implement this now... @mllg I'm happy to do this in just proba unless you think it could also be useful in mlr3 generally?

RaphaelS1 avatar Apr 03 '22 08:04 RaphaelS1

That's a nice idea with the $model property! Desing-wise, should it be part of the measure class or the prediction class (like $score is)? Wouldn't it be a problem if I used the same measure for comparing some models and the $model property is rewritten for each different model?

Also I was thinking that maybe this means that design-wise, there is no need for the integrated property? In the sense that always prediction$score(measure) returns one (possibly integrated) score no matter the other inputs (e.g. a vector of times or something different depending on the measure and function called) and the full result is accessible on the $model property you mentioned. As it currently is, having intergated = FALSE and not allowing a vector of times for the ROC AUC scores was a bit confusing for me :)

Another example I was thinking: what if for a measure, i.e. the C-index (surv.cindex), we wanted to have some confidence intervals or p-values for predictions made by a survival model (like survival::concordance() does for example) - would you put that also in the $model?

bblodfon avatar Apr 03 '22 09:04 bblodfon

Another example I was thinking: what if for a measure, i.e. the C-index (surv.cindex), we wanted to have some confidence intervals or p-values for predictions made by a survival model (like survival::concordance() does for example) - would you put that also in the $model?

This is something we're working on separately, which is to return observation-level losses as well as aggregated losses so we can generate our own SEs. But what your suggesting also makes sense. BTW I just used $model as an example, I guess here we'd actually call it $measure and I agree putting this in the prediction class makes sense.

I'm not sure where would make the most sense to store the output though. So in other languages you would construct one measure per prediction, and then each is mutated on scoring. The alternative is that you store the evaluation in the prediction object, e.g. so you have something like prediction$scores which returns like list(surv.hong_auc = {measure_output}, surv.cindex = {full_measure_output}).

@mllg let me know if something like this could ever be implemented in mlr3, if not I think I might add to proba

RaphaelS1 avatar Apr 04 '22 13:04 RaphaelS1

We had a discussion recently to support observation-wise loss in prediction objects. I'll get back to you once this is finished.

mllg avatar Apr 05 '22 10:04 mllg

@mllg that was more of an example, here we mean more generally allowing outputs from functions to be stored within the object. So basically giving a Prediction class a field called scores that just lists the full output from measures where applicable

RaphaelS1 avatar Apr 06 '22 07:04 RaphaelS1

Hi! Any updates from the mlr3 team regarding this issue?

I think the proper solution here would be to allow MeasureSurv to return measures at multiple time-points simultaneously because right now when integrated=FALSE, the times parameter has to be a scalar which is strange (getting the measure for a specific timepoint only).

bblodfon avatar May 31 '22 15:05 bblodfon

@mllg @be-marc I guess the way things are right now in mlr3 we can't return a vectorized AUC(t) measure for multiple time points, right? (a measure needs to return one value only, a scalar)

@RaphaelS1: with survivalmeasures we will make our own interface so I guess we can support this feature there and then we can come back and see what can be done in mlr3proba/mlr3? In anycase we need a new issue since this one has a lot of stuff on it, okay?

bblodfon avatar Dec 04 '23 09:12 bblodfon

I updated the issue in https://github.com/bblodfon/survivalmeasures/issues/1 and later we can come back to this one if needed.

bblodfon avatar Dec 18 '23 17:12 bblodfon