yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Probability Calibration Curve

Open bbengfort opened this issue 7 years ago • 12 comments

When performing classification one often wants to predict not only the class label but also the associated probability to give a level of confidence in the prediction. The sklearn.metrics.calibration_curve method returns true and predicted probabilities that can be visualized across multiple models to make the best model selection based on reliability.

Probability Calibration Curve

Alexandru Niculescu-Mizil and Rich Caruana (2005) Predicting Good Probabilities With Supervised Learning, in Proceedings of the 22nd International Conference on Machine Learning (ICML). See section 4 (Qualitative Analysis of Predictions).

bbengfort avatar Mar 24 '18 02:03 bbengfort

looking into this :)

augustbleeds avatar May 14 '18 15:05 augustbleeds

@bbengfort , Hi, I am interested in contributing to this feature.

https://nbviewer.jupyter.org/github/saurabhdaalia/ProbablityCurve/blob/master/Curve.ipynb I have put together this very naive sort of example to get an idea that what are we trying to implement exactly. Would love to get some feedback over it to put down some initial steps to implement the feature.

saurabhdaalia avatar Mar 09 '19 22:03 saurabhdaalia

Hi @saurabhdaalia Thanks for taking interest in Yellowbrick. However @bbengfort is going to be away for the next couple of weeks. You are welcome to open a pull request to address the issue. I encourage you to check out the contributor’s guide for our conventions around branching, API, testing, etc https://www.scikit-yb.org/en/develop/contributing/index.html If you don't hear back from us in a couple of weeks just give us another ping. We look forward to your contribution.

lwgray avatar Mar 09 '19 23:03 lwgray

Hi @bbengfort,

I was hoping to get some clarification regarding the implementation of the visualizer. When we are showing the calibration curve using the reliability diagram, should only the true calibrated probabilities be plotted as a base or do we need to plot another model as base as well? As shown in the figure Logistic Regression is also used for comparability.

saurabhdaalia avatar Apr 02 '19 21:04 saurabhdaalia

Does it make sense, if we add something like this?

proba_curve = ProbabilityCalibrationCurve(GaussianNB(),base=LogisticRegression())
proba_curve.fit(X_train, y_train)
proba_curve.score(X_test, y_test)
proba_curve.poof()

user can pass a base parameter which will be a secondray model for better comparability

saurabhdaalia avatar Apr 02 '19 21:04 saurabhdaalia

Hi @saurabhdaalia Thanks for the questions and comments. we are currently working through our backlog of PRs and Issues. We will address this asap.

lwgray avatar Apr 03 '19 16:04 lwgray

@saurabhdaalia I am assigning this issue to you

lwgray avatar May 22 '19 21:05 lwgray

@lwgray , Thank you. Going to start working on it, ASAP

saurabhdaalia avatar May 24 '19 19:05 saurabhdaalia

@saurabhdaalia how's it going? Can I help in any way?

lwgray avatar May 31 '19 22:05 lwgray

Some neat examples from R: https://rdrr.io/cran/rms/man/val.prob.html https://rdrr.io/github/BavoDC/CalibrationCurves/man/val.prob.ci.2.html

ohadle avatar Sep 24 '19 15:09 ohadle

Hi all, I am looking for a way to plot a reliability diagram (accuracy vs confidence) for a multi-class task. Do you have any resources in your mind to plot reliability diagrams for such tasks? Thank you.

Mahhos avatar May 03 '21 18:05 Mahhos

@Mahhos this does sound like an interesting visualizer; do you have some examples you could point us to?

bbengfort avatar Sep 25 '21 17:09 bbengfort