case weights
It would be great to have the calculations for the curve take into account cases weights (i.e. a non-negative, numeric vector of values the same length as the other data objects).
I agree this would be cool. Do you have a reference on how this is implemented in the context of ROC curves?
The curve would be based on the weighted versions of sensitivity and specificity.
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(pathology)
str(pathology)
#> 'data.frame': 344 obs. of 2 variables:
#> $ pathology: Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...
#> $ scan : Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...
set.seed(1)
pathology$weights <- runif(nrow(pathology))
event <- "abnorm"
unweighted <-
sum(pathology$pathology == event & pathology$scan == event) /
sum(pathology$pathology == event)
unweighted
#> [1] 0.8953488
# via yardstick:
sensitivity(pathology, pathology, scan)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 sens binary 0.895
weighted <-
sum( pathology$weights * (pathology$pathology == event & pathology$scan == event) ) /
sum( pathology$weights * (pathology$pathology == event) )
weighted
#> [1] 0.9013333
Created on 2021-09-13 by the reprex package (v2.0.0)
@davisvaughan has the start of changes that we will be making to yardstick here
I think I see. The easiest would be to directly update the roc.utils.perfs.all.fast to calculate TP/FP taking the weights into account:
tp <- cumsum(response.sorted==1 * weights.sorted)
fp <- cumsum(response.sorted==0 * weights.sorted)
A few thought on the implementation:
- The number of cases and controls might become fractional because of this change. I'm not sure what side-effects this could have.
- There's a C++ algorithm that will need to be updated too. It's a loop so it should be quite straightforward. Alternatively it could be a good time to get rid of alternative algorithms and simplify the code.
- It will be necessary to modify the
rocobjects and store the weights there, so that bootstrap functions re-use the weights appropriately. - At this point I'm not sure how much changes will be required in those bootstrapping functions. They've needed major refactoring for a long time but I never found the time to do so.
- Issue #70 will get in the way. There's quite a lot of redundancy as pROC has several functions that build ROC curves under the hood (ie
auc,ci, etc), which will have to be updated.