pROC icon indicating copy to clipboard operation
pROC copied to clipboard

What does "direction" mean in roc function

Open Ivy-ops opened this issue 1 year ago • 3 comments

Hi developer, I am trying to use roc() function with my dataset; after reading the description of the "direction", I still can not understand what does this mean. It would be highly appreciated if you can help me with this: I use random forest and get the probability of each sample(shown below), the second column is for "Case" group. My dataset rf$prediction: Control Case [1,] 0.24642643 0.7535736 [2,] 0.33507026 0.6649297 [3,] 0.45731121 0.5426888 [4,] 0.46547831 0.5345217 [5,] 0.53042247 0.4695775 [6,] 0.31020475 0.6897952 [7,] 0.15786178 0.8421382 [8,] 0.15340136 0.8465986 [9,] 0.15774135 0.8422587 [10,] 0.18421489 0.8157851 [11,] 0.64663338 0.3533666 [12,] 0.40697185 0.5930282 [13,] 0.37198661 0.6280134 [14,] 0.57076432 0.4292357 [15,] 0.18086131 0.8191387 [16,] 0.58201416 0.4179858 [17,] 0.19227444 0.8077256 [18,] 0.46165459 0.5383454 [19,] 0.19301864 0.8069814 [20,] 0.66767106 0.3323289 [21,] 0.80801017 0.1919898 [22,] 0.66952125 0.3304788 [23,] 0.62995097 0.3700490 [24,] 0.50042121 0.4995788 [25,] 0.77477208 0.2252279 [26,] 0.60949394 0.3905061 [27,] 0.82625698 0.1737430 [28,] 0.65935287 0.3406471 [29,] 0.07350427 0.9264957 [30,] 0.72550278 0.2744972 [31,] 0.72104726 0.2789527 [32,] 0.65799964 0.3420004 [33,] 0.70231445 0.2976856 [34,] 0.32174162 0.6782584 [35,] 0.86845567 0.1315443 [36,] 0.50935250 0.4906475 [37,] 0.44772867 0.5522713 [38,] 0.78675787 0.2132421

actual [1] Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Control [21] Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Levels: Case Control

table(actual, predict) predict actual Case Control Case 15 4 Control 3 16

Then I use roc function:

pROC::roc(actual, rf$predictions[,2], levels = c('Case','Control'), plot=T, direction = '>') Call: roc.default(response = actual, predictor = rf$predictions[, 2], levels = c("Case", "Control"), direction = ">", plot = T) Data: rf$predictions[, 2] in 19 controls (actual Case) > 19 cases (actual Control). Area under the curve: 0.8726 pROC::roc(actual, rf$predictions[,2], levels = c('Case','Control'), plot=T, direction = '<') Call: roc.default(response = actual, predictor = rf$predictions[, 2], levels = c("Case", "Control"), direction = "<", plot = T) Data: rf$predictions[, 2] in 19 controls (actual Case) < 19 cases (actual Control). Area under the curve: 0.1274

As we can see in the above code, I can have 2 different AUCs. I refer to the tutorial of roc() and https://stackoverflow.com/questions/31756682/what-does-coercing-the-direction-argument-input-in-roc-function-package-proc that mentioned about direction means probability < |> the threshold.

Does direction mean: when I calculate the 1st sample, if I use threshold=0.5 and direction ">", direction means 0.7535736> 0.5, sample 1 will be predicted as "Case"? If I use threshold = 0.5 and direction "<", what does direction mean? Too confused. When to use ">" and when to use "<"? Looking forward to your help! Much appreciated!

Ivy-ops avatar Feb 07 '24 23:02 Ivy-ops

Thanks for your report.

I'm not sure what's unclear exactly. What do you suggest should be clarified precisely, and can you maybe make some suggestions of better ways to explain that?

xrobin avatar Feb 09 '24 13:02 xrobin

Hi @xrobin , Thanks for the reply. Based on the tutorial:

">”: if the predictor values for the control group are higher than the values of the case group (controls > t >= cases) “<”: if the predictor values for the control group are lower or equal than the values of the case group (controls < t <= cases).

In my case: Does direction mean: when I calculate the 1st sample[the prediction probability for Control=0.24642643; Case=0.7535736], if I use threshold=0.5 and direction ">", direction means: 0.7535736> 0.5, sample 1 will be predicted as "Case"? If I use threshold = 0.5 and direction "<", what does direction mean? Thank you for your patience!

Ivy-ops avatar Feb 09 '24 16:02 Ivy-ops

I attempted to clarify the documentation. Here is the new description of direction:

how are positive observations defined? “<”: observations are positive when they are greater than or equal (>=) to the threshold. “>”: observations are positive when they are smaller than or equal (<=) to the threshold. “auto” (default): automatically detect in which group the median is higher and take the direction accordingly. See details. You should set this explicity to “>” or “<” whenever you are resampling or randomizing the data, otherwise the curves will be biased towards higher AUC values.

Is it clearer like this?

xrobin avatar Mar 05 '24 07:03 xrobin