MLJBase.jl AUC again

The current implementation, adapted from https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en (now that page is unavailable...) is much faster than a basic approach. However it gives different values than (say) sklearn when there are (many) ties (e.g. comment: https://www.kaggle.com/c/ieee-fraud-detection/discussion/107601).

The risk of this happening in a real setting is fairly low but can stump people who are testing things out. Maybe a way to get around this would be to have a "slow mode" where the user would pass a keyword slow=true which would ensure the computation is identical to sklearn.

I'm not sure there's a lot of value in this generally but we could just generally point to that when people are trying things and saying that our stuff is just wrong.

May 23 '20 12:05 tlienart

lol. @tlienart. It's probably better if we make the sklearn version the default then add a fast or speedup keyword for the faster version

May 24 '20 01:05 OkonSamuel

:joy: Ok I should probably have written this differently. If you look at the tests in #311 you'll see that the two versions return the same thing (one is just 32x faster on average than the other). Basically this is a question of what is the proportion of ties in scores wrt the length of the sample. If that proportion is relatively high, then the results will diverge.

Anyway I'd like to bring stuff from EvalMetrics.jl (who do "conservative" AUC computation) and I think your suggestion of having keywords to select the method makes sense. I'd be in favour of switching to the "fast" one by default if n > 100 all of which could be documented.

May 24 '20 09:05 tlienart

The current implementation, adapted from https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en (now that page is unavailable...) is much faster than a basic approach.

It's too bad that link is broken. 😢 As far as I can tell from looking at the code, it looks like MLJBase and scikit-learn both use the basic approach described in the 2006 paper by Tom Fawcett. The other approach I've seen is to use the Wilcoxon-Mann-Whitney statistic to calculate AUC, though I'm not certain if that actually leads to a different algorithm...

Actually, now that I look closer, it looks like the MLJBase algorithm doesn't actually implement the procedure that Fawcett uses to handle ties:

It looks like MLJBase gets close to the same numbers that the Fawcett algorithm gets when the positives and negatives with the same score are sufficiently mixed after sorting by score, but if you have 10 positives in a row and then 10 negatives in a row with the same score after sorting, then there would be a larger difference in AUC between MLJBase and the Fawcett algorithm.

Jun 04 '20 04:06 CameronBieganek

MLJBase.jl MLJBase.jl copied to clipboard

AUC again

MLJBase.jl
MLJBase.jl copied to clipboard