MAPIE
MAPIE copied to clipboard
Added binary classification support to MAPIE using the mondrian conformal predictor
Description
Mapie was unable to perform confidence estimation on binary classification problems. To address this issue, I have implemented the mondrian conformal as a method of the MapieClassifier. This method is described in detail on page 5 of this paper, but in essence it uses the quantiles of each class to determine inclusion in the prediction set, as opposed to one quantile found from both classes. This method is not constrained to binary classification, and should work for imbalanced multiclass problems as well.
Closes #216
Type of change
Please remove options that are irrelevant.
- New feature (non-breaking change which adds functionality)
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
- [x] Ran the make test command, and all tests passed
I also tested the changes on a workflow where mapie was included, and altering the method to mondrian gave similar results.
Checklist
- [x] I have read the contributing guidelines
- [x] I have updated the HISTORY.rst and AUTHORS.rst files
- [x] Linting passes successfully :
make lint
- [x] Typing passes successfully :
make type-check
- [x] Unit tests pass successfully :
make tests
- [x] Coverage is 100% :
make coverage
- [x] Documentation builds successfully :
make doc
I do unfortunately not have much experience writing tests, and do not know the best way to do this, so if anyone can assist on that front with help or advice I would be grateful.
Hey, I have tried to read the error logs of the failed tests, but they seem to be failing at the level of installing numpy, and I cannot see a reason as to why this happens. If anyone has some advice so that I can fix it that would be much appreciated.
Can we get this merged? would be super useful
Hello @adamzenith,
Thank you for submitting your pull request to propose the Mondrian Conformal Predictor. I have read with interest your modifications and proposals to implement this method. I hope I understood correctly and that the correction elements I bring you will be relevant. Don't hesitate to share your feedback with me!
1. Your PR in a nutshell
You have proposed an implementation of the Mondrian Conformal Predictor as a method of the MapieClassifier.
- The goal of this method is to ensure a conditional coverage of $1-\alpha$ for each class by computing the $1-\alpha$ quantile of the conformal scores for each class to determine their inclusion in the prediction set.
- As you stated, this method is not limited to binary classification and should also work for unbalanced multiclass problems.
2. Our feedback on the PR
We believe that the Mondrian Conformal Predictor could be a good enhancement in MAPIE as it has been mentioned and popularized in related work on drug discovery. However, at this time, we lack evidence for comparison with existing methods in MAPIE as proof of the compelling value of using this method in specific use cases. We need concrete examples (in jupyter notebooks for example) that demonstrate that Mondrian Conformal Predictor is better than other methods in MAPIE for solving binary or unbalanced multi-class problems. This will be a demonstration not only for us but for all MAPIE users. We invite you to consult the existing notebooks to help you.
3. Additional comments to improve your code
My suggestions are about modifications to make your code as generic as possible.
-
I noticed that you have added elements that work specifically on your development settings and are therefore not intended for generic use in MAPIE (as in .gitignore and Makefile). This is not a problem in itself for code execution, but we prefer to keep the code as generic as possible.
-
In the same vein, you have adapted the
compute_quantiles
function with a new parameter namedmondrian
(in the utils.py file). Even if exceptions exist, we prefer to continue to implement generic functions that do not depend on external method attributes, especially when these choices impact the size and shape of the output (since as many quantiles as classes are computed whenmondrian=True
, whereas only one quantile is computed withmondrian=False
).
4. Actions to be taken
I propose a list of actions to help you improve your proposal and help us integrate it into MAPIE:
- Propose a notebook that demonstrates the value of using the Mondrian Conformal Predictor in place of the other multi-class conformal predictors proposed in MapieClassifier.
- Delete non generic settings in .gitignore and Makefile files (related to your virtual environment
mapieenv
). - Implement a new function named
compute_class_quantiles
which performs the same function ascompute_quantiles
with the parametermondrian=True
in the utils.py file. - Correct typing errors (such as line breaks in mapie/classification.py).
I remain available if you have any questions and thank you in advance for your feedback.
I tested this out myself and it worked well. Nice job.
Would be very helpfull!!