torchmetrics
torchmetrics copied to clipboard
Classification: option to disable input formatting [wip]
What does this PR do?
Fixes #1526 Fixes #1604 Fixes #1989 Fixes #2195 Fixes #2329
Before submitting
- [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
- [x] accuracy.py
- [x] auroc.py
- [x] average_precision.py
- [ ] calibration_error.py
- [ ] cohen_kappa.py
- [x] confusion_matrix.py
- [ ] dice.py
- [ ] exact_match.py
- [x] f_beta.py
- [ ] group_fairness.py
- [x] hamming.py
- [ ] hinge.py
- [ ] jaccard.py
- [ ] matthews_corrcoef.py
- [ ] precision_fixed_recall.py
- [x] precision_recall.py
- [x] precision_recall_curve.py
- [ ] ranking.py
- [ ] recall_fixed_precision.py
- [x] roc.py
- [x] specificity.py
- [ ] specificity_sensitivity.py
- [x] stat_scores.py
PR review
Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃
@SkafteNicki how is it going here? :rabbit:
Hi @SkafteNicki, do you guys have any update on this one please?
do you guys have any update on this one please?
We need to address failing checks
@SkafteNicki how is it going here? :rabbit:
regarding #1989 I would consider drop auto as it is if you have first inputs accidentally in range (0, 1) but your true nature is logits so you would end up with treading inputs wrongly as prob... right?
@SkafteNicki what's left here? :)
I'd probably name the things a bit differently and stick to string arguments only e.g no True and instead of False have a "none" arg
the resulting docstring would then be
input_format: str specifying the format of the input preds tensor. Can be one of:
- ``'auto'``: automatically detect the format based on the values in the tensor. If all values
are in the [0,1] range, we consider the tensor to be probabilities and only thresholds the values.
If all values are non-float we consider the tensor to be labels and does nothing. Else we consider the
tensor to be logits and will apply sigmoid to the tensor and threshold the values.
- ``'probs'``: preds tensor contains values in the [0,1] range and is considered to be probabilities. Only
thresholding will be applied to the tensor and values will be checked to be in [0,1] range.
- ``'logits'``: preds tensor contains values outside the [0,1] range and is considered to be logits. We
will apply sigmoid to the tensor and threshold the values before calculating the metric.
- ``'labels'``: preds tensor contains integer values and is considered to be labels. No formatting will be
applied to preds tensor.
- ``'none'``: will disable all input formatting. This is the fastest option but also the least safe.
@SkafteNicki, how is it going? What is missing... :flags:
setting as ready PR just to enable full testing...
Codecov Report
Merging #1676 (a7d719b) into master (4c999b8) will decrease coverage by
0%. Report is 14 commits behind head on master. The diff coverage is36%.
Additional details and impacted files
@@ Coverage Diff @@
## master #1676 +/- ##
======================================
- Coverage 69% 69% -0%
======================================
Files 307 307
Lines 17352 17406 +54
======================================
+ Hits 11961 11992 +31
- Misses 5391 5414 +23
Thanks for working on the bug in #1604! I suspect this bug is silently affecting people currently using torchmetrics.
It looks like the current solution is a new argument input_formatand gives the user the option to specify if they are providing probabilities, logits, etc (see below documentation). This almost fixes the problem, but this argument defaults to input_format='auto' which will cause exactly the bug discussed in #1604 (i.e. if your logits happen to all be in [0, 1] this will incorrectly fail to convert them to probabilities).
I see two options
-
Change the default to
input_format='logits'(orprobsthough I think logits is the most common input). This default options is explicit about behavior that is a but subtle and will prevent the bug from happening by defaults -
Get rid of the
autooption all together. Theautooption will always be buggy if the user inputs logits.
I personally see no reason to include the auto option and would vote for option 2.
input_format: str specifying the format of the input preds tensor. Can be one of:
- ``'auto'``: automatically detect the format based on the values in the tensor. If all values
are in the [0,1] range, we consider the tensor to be probabilities and only thresholds the values.
If all values are non-float we consider the tensor to be labels and does nothing. Else we consider the
tensor to be logits and will apply sigmoid to the tensor and threshold the values.
- ``'probs'``: preds tensor contains values in the [0,1] range and is considered to be probabilities. Only
thresholding will be applied to the tensor and values will be checked to be in [0,1] range.
- ``'logits'``: preds tensor contains values outside the [0,1] range and is considered to be logits. We
will apply sigmoid to the tensor and threshold the values before calculating the metric.
- ``'labels'``: preds tensor contains integer values and is considered to be labels. No formatting will be
applied to preds tensor.
- ``'none'``: will disable all input formatting. This is the fastest option but also the least safe.
@idc9 thanks for giving your opinions on this issue.
I already laid out on slack some of my opinions, but just for transparency here are the two main reasons for having auto as the default:
- To keep everything backwards compatible
- To say "The auto option will always be buggy if the user inputs logits." is simply wrong. For reasonably sized input, any model should really output logit values outside the [0,1] range which after sigmoid transformation corresponds to probabilities in the [0.5, 0.73] range. If a model is only outputting values in the 0.5-0.73 probability range, this will mean it is never going to predict the negative class, and it is never confident about the positive class. Both seems highly unlikely for any well trained model.
@SkafteNicki In cases where GPU memory limits batch sizes to be very small, there will be no "reasonably-sized input", so "auto" will be buggy. What about not having "auto", but rather having only "probs" and "logits", with "probs" being the default and verifying that its input is in [0, 1]? This will be backwards compatible, except raise an error in the case that the old version was doing the wrong thing. The error message can suggest the user might mean input_format = "logits".
Sorry I should have been more careful with my language! Let me rephrase “The auto option will always be buggy if the user inputs logits” to “The ‘auto’ option leaves open the possibility for the bug to happen when the user inputs logits”.
Since the auto option leaves open the possibility of this bug occurring I suspect we don’t we want to preserve backwards compatibility. This is a subtle/unexpected issue many users won’t immediately realize it can be an issue. Given it’s straightforward for the user to specify logits/prob, I’m curious if there is any case where ‘auto’ is useful (i.e. should it just be removed)?
As @SkafteNicki pointed out the logits bug will only occur if all of the predicted probabilities lie in [0.5, 0.73]. It’s true this is likely not the usual scenario but it probably does come up. A few examples where it may be more common include: small batches, early phases of model training, difficult applications where hold-out probabilities are (sadly) closeish to 0.5, unbalanced classes, situations where your label prediction rule thresholds the probability at some value different from 0.5.
@NoahAtKintsugi and @idc9 Alright, I am willing to give me on this, but we are then going to break backwards compatibility, which I do not take as a light issue. If we do this change we need to do it right. You are probably right that it is better to break stuff now and then fix this issue that may be a potential hidden problem for some users.
@Borda and @justusschock please give some opinions on this. Do we
- continue with
input_format="auto"as the default keeping everything backwards compatibility - set
input_format="auto"for v1.4 (next release), include user warning that it will be changing toinput_format="logits"/"probs"(which one we choose does not matter to me) in v1.5 - something else...
- set
input_format="auto"for v1.4 (next release), include user warning that it will be changing toinput_format="logits"/"probs"(which one we choose does not matter to me) in v1.5
I think due to stability, we may add a warning but being having a hard switch from 2.0 cc: @lantiga
Defaulting to logits might make the most sense. I suspect is that it's more common for users to default to outputting logits than outputting probabilities (e.g. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html, https://github.com/huggingface/pytorch-image-models/blob/main/train.py, https://github.com/pytorch/examples/blob/main/imagenet/main.py) probably since this avoids a small bit of extra computation.
@SkafteNicki, how is this one going? :)