yellowbrick Update ROCAUC to aid interpretation

Whenever I use a ROC plot I have to refresh myself about what it means. In particular - what do the axis labels mean and where are the thresholds on the plot. It doesn't help that wikipedia's https://en.wikipedia.org/wiki/Receiver_operating_characteristic page has a heap of formulas and their confusion matrix example has different conventions relative the sklearn's.

I'd like clearer descriptions for the axis labels and a guide to interpreting the thresholds that'll give me a point on the curve that I might choose (which is very likely not to be the 0.5 default threshold in sklearn). I suggest an example below.

Here's the current plot in 0.9.1. I've used the standard cancer dataset with 1 feature and a default LogisticRegression:

Here is my suggestion for a more interpretable plot for discussion (feel very free to push back, maybe I added too much!):

My suggestions are:

Add formula annotations to the x and y axis
Add "0" and "1" to the labels along with the human-readable class names (personally I work for False or True classes and only think on the human-readable names after)
Added 3 increasing-size circles to mark the points on each curve closest to decision thresholds for 0.25, 0.5 (the sklearn default) and 0.75 to give me an idea of which threshold I might want to choose

I don't actually like the increasing-size circles but I'm not sure how to better introduce this idea.

This idea is built out of this lovely blog post: https://lukeoakdenrayner.wordpress.com/2018/01/07/the-philosophical-argument-for-using-roc-curves/

In the blog post colour was used but with multiple curves that'll get messy really quickly so I figured avoiding that might be better.

I was introduced to this post after my talk at PyDataAmsterdam last year (which included yb): https://twitter.com/ianozsvald/status/1000373609888706560

Apologies for leaving this suggestion for so long, I wrote prototype code after PyDataAmsterdam last May, then got distracted, then lost it!

Thoughts?

Feb 10 '19 22:02 ianozsvald

First, thank you so much for including YB in your PyData Amsterdam talk! And by the way - the "confusion probabilities" would make an awesome visualizer ...

I've got to say that I'm with you in that I rarely use ROC/AUC for diagnostics and prefer Precision-Recall curves and other tools (particularly since I mostly deal with multiclass classifiers). ROCAUC is in Yellowbrick for exactly the reason you mentioned - it's in every tutorial and we couldn't not include it. I had seen that blog before (maybe you tweeted it?) and I could get on board with it, but it's never been a requested feature until now!

So for suggestions 1&2: may I propose that we add an expressive=True keyword argument to the visualizer? If this is specified then we can go ahead and add the more detailed axes labels and 0/1 to the legend, but have it be False by default. Would that meet the requirement?

I really like the idea of marking the 25%, 50% and 75% thresholds, but like you am not sold the varying sized circles. I also agree that the color map in the blog post would be tough to read. I suppose adding dotted vertical and horizontal lines that were semi-transparent would be too much noise? What about different markers, e.g. x, d, ^?

Feb 11 '19 03:02 bbengfort

You're very welcome! It was the Confusion Probabilities part that got me the feedback about ROC curves and, given that article, I think my chart (whilst being pretty) is less useful than an annotated ROC curve. I meant to link the slides from PyDataAmsterdam, here they are: https://ianozsvald.com/2018/05/26/creating-correct-and-capable-classifiers-at-pydataamsterdam-2018/

My chart does show distributions at a specified cutoff (using the default 0.5 threshold in the talk) but I wonder how useful that is, as you cut through the distributions as you change the threshold in a ROC curve so you "see" the distributions expressed via the curve. My chart is shown in this photo (and in my slides above) in case anyone else has thoughts? https://twitter.com/tobias_sterbak/status/1000290231717974016/photo/1

Re. an expressive keyword - I'm fine with however you'd want to handle it. Wouldn't you want more expressive on by default (and possibly for other plots too) out of the box? The goal of the charts is communication and often the consumers will be less-technical folk (e.g. in presentations), so having helpers to aid interpretability is probably a good thing. At least - that's my guess about the wider consuming audience based on how I've used them. Does your experience differ?

Maybe knowing who the intended audience is for yellowbrick in general helps define how much should be shown by default (and I figure asking this might be useful because maybe the consuming audience has evolved over several years?).

Re. the thresholds - yes I threw in the sizing on the circles just to make the point, they do look ugly! I thought about user-specified thresholds too but I'm not sure on the utility. For me I'd always wondered "on a ROC chart when I see the sticky-outy-to-the-left sweetspot - is that with a 0.5 threshold or something else?". I wonder if just showing a single 0.5 threshold point (as a circle?) is sufficient to indicate that maybe there's a different threshold to be chosen, then it is up to the user to do some testing?

Or we might let them specify a threshold which is 0.5 by default, so they could tune it using the chart. The downside is the addition of yet-another-parameter. Possibly making it a constant in __init__ and waiting to see if anyone wants to override it is a useful starting place?

Feb 11 '19 09:02 ianozsvald

Can I be assigned this issue?

Feb 13 '19 04:02 dnabanita7

@bbengfort just for completeness - if there was a desire to include my histograms-of-probabilities confusion matrix variant, the code is online at In[27] in the Notebook I used for my PyDataAmsterdam talk: https://github.com/ianozsvald/data_science_delivered/blob/master/ml_creating_correct_capable_classifiers.ipynb

Feb 13 '19 10:02 ianozsvald

@Naba7 - We really appreciate your enthusiasm to contribute to Yellowbrick! We don’t usually specifically assign people to individual tasks unless it is something only they can take care of, we usually proceed by having the contributor open a PR and for the issue and then the discussion takes place there. We see that you have already opened a PR for #684, which is great! When you are ready to take care of the other issues, please go ahead and submit a PR for them as well.

Also, please bear with us during this time. We are all volunteers and are currently overloaded at the moment. We do intend to get back to you but it might take longer than usual.

Thanks again for contributing to Yellowbrick!

Feb 13 '19 18:02 pdamodaran

Hi @ianozsvald sorry again for the delay, as @pdamodaran mentioned we are a bit overloaded with issues this week!

Thanks for providing the code for the histograms-of-probabilities confusion matrix variant, I think it could be an interesting visualizer so I'll add it to the issues if you don't mind!

I think you've won me over with your argument about expressiveness. We do view yellowbrick as having two audiences - students who are trying to understand what the ML is doing under the hood and seasoned professionals who are using the visualizations as "at a glance diagnostics". I don't think that the extra text or formulae get in the way of the latter and may enhance the experience of the former, so why turn it off?

Another potential labeling convention is "sensitivity" and "specificity" as per: http://mfviz.com/binary-predictions/ -- though I wonder if this only makes sense in the binary case.

As for the markers for the thresholds, this is indeed a tough one. I'll continue to look into it as we get into this feature request. Out of curiosity have you used the DecisionThreshold visualizer? What did you think of it? (I'm trying to remember if it was in the video or not, I've forgotten!)

Feb 15 '19 00:02 bbengfort

Hey @bbengfort. I have no strong view on adding specificity/sensitivity, as you note this gets less useful in the multi-class case (but then, with many classes these visualisations are probably less useful in general as too many lines just get hard to read). Re https://yellowbrick.readthedocs.io/en/latest/api/classifier/threshold.html I think I didn't use it in my talks but I did investigate it at the time, I've not used it since. It does look sensible. Re. the thresholds - I do think that identifying at least the 0.5 default threshold has utility, I suspect many people think it'll be where "the bump" might be in the curve but there's no reason for that to be the case. If that led folk to ask questions about using other thresholds and visualising them, you'd want them to use the threshold visualisation. But...that then lacks TPR and TNR so you couldn't take your view from the ROC curve and figure out a better threshold on the other (unless a new line or two were added to the threshold plot too). Using the threshold plot to show thresholds via various metrics seems the way-more-useful place to keep all of this for sure.

Feb 19 '19 18:02 ianozsvald

I'm not going to pretend to profess any expertise yet based on a cursory reading and in answer to the first question: in the interests of brevity i would call the X axis 'ALARM (-)' and the Y axis 'DETECTION (+)'.

In response to the second part: the use of circles clearly starts adding complexity to the graph and raises questions of visibility, understanding, meaning and comprehension. As such would it not be more advisable to use the circles (or squares or any other shapes) in outline form (shell design) and to use arrows to indicate direction or trajectory? These pointers or markers could then be explored more fully in (colour / shade / pattern coded) sub-graphs to draw out more meaningful content.

Mar 12 '19 12:03 gavsideas

hi @gavsideas Thanks for adding to the discussion. We are on a very short hiatus and will provide feedback once we return. cheers.

Mar 12 '19 13:03 lwgray

yellowbrick yellowbrick copied to clipboard

Update ROCAUC to aid interpretation

yellowbrick
yellowbrick copied to clipboard