torchmetrics Classification: option to disable input formatting [wip]

What does this PR do?

Fixes #1526 Fixes #1604 Fixes #1989 Fixes #2195 Fixes #2329

Before submitting

[x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
[x] Did you read the contributor guideline, Pull Request section?
[ ] Did you make sure to update the docs?
[ ] Did you write any new necessary tests?

[x] accuracy.py
[x] auroc.py
[x] average_precision.py
[ ] calibration_error.py
[ ] cohen_kappa.py
[x] confusion_matrix.py
[ ] dice.py
[ ] exact_match.py
[x] f_beta.py
[ ] group_fairness.py
[x] hamming.py
[ ] hinge.py
[ ] jaccard.py
[ ] matthews_corrcoef.py
[ ] precision_fixed_recall.py
[x] precision_recall.py
[x] precision_recall_curve.py
[ ] ranking.py
[ ] recall_fixed_precision.py
[x] roc.py
[x] specificity.py
[ ] specificity_sensitivity.py
[x] stat_scores.py

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Mar 31 '23 11:03 SkafteNicki

@SkafteNicki how is it going here? :rabbit:

Apr 17 '23 12:04 Borda

Hi @SkafteNicki, do you guys have any update on this one please?

Aug 03 '23 10:08 samet-akcay

do you guys have any update on this one please?

We need to address failing checks

Aug 03 '23 16:08 Borda

@SkafteNicki how is it going here? :rabbit:

Aug 07 '23 20:08 Borda

regarding #1989 I would consider drop auto as it is if you have first inputs accidentally in range (0, 1) but your true nature is logits so you would end up with treading inputs wrongly as prob... right?

Aug 25 '23 11:08 Borda

@SkafteNicki what's left here? :)

Oct 20 '23 13:10 justusschock

I'd probably name the things a bit differently and stick to string arguments only e.g no True and instead of False have a "none" arg

the resulting docstring would then be

input_format: str specifying the format of the input preds tensor. Can be one of:
            - ``'auto'``: automatically detect the format based on the values in the tensor. If all values
                are in the [0,1] range, we consider the tensor to be probabilities and only thresholds the values.
                If all values are non-float we consider the tensor to be labels and does nothing. Else we consider the
                tensor to be logits and will apply sigmoid to the tensor and threshold the values.
            - ``'probs'``: preds tensor contains values in the [0,1] range and is considered to be probabilities. Only
                thresholding will be applied to the tensor and values will be checked to be in [0,1] range.
            - ``'logits'``: preds tensor contains values outside the [0,1] range and is considered to be logits. We
                will apply sigmoid to the tensor and threshold the values before calculating the metric.
            - ``'labels'``: preds tensor contains integer values and is considered to be labels. No formatting will be
                applied to preds tensor.
            - ``'none'``: will disable all input formatting. This is the fastest option but also the least safe.

Dec 20 '23 15:12 justusschock

@SkafteNicki, how is it going? What is missing... :flags:

Dec 27 '23 20:12 Borda

setting as ready PR just to enable full testing...

Jan 09 '24 21:01 Borda

Codecov Report

Merging #1676 (a7d719b) into master (4c999b8) will decrease coverage by 0%. Report is 14 commits behind head on master. The diff coverage is 36%.

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #1676   +/-   ##
======================================
- Coverage      69%     69%   -0%     
======================================
  Files         307     307           
  Lines       17352   17406   +54     
======================================
+ Hits        11961   11992   +31     
- Misses       5391    5414   +23

Jan 09 '24 23:01 codecov[bot]

Thanks for working on the bug in #1604! I suspect this bug is silently affecting people currently using torchmetrics.

It looks like the current solution is a new argument input_formatand gives the user the option to specify if they are providing probabilities, logits, etc (see below documentation). This almost fixes the problem, but this argument defaults to input_format='auto' which will cause exactly the bug discussed in #1604 (i.e. if your logits happen to all be in [0, 1] this will incorrectly fail to convert them to probabilities).

I see two options

Change the default to input_format='logits' (or probs though I think logits is the most common input). This default options is explicit about behavior that is a but subtle and will prevent the bug from happening by defaults
Get rid of the auto option all together. The auto option will always be buggy if the user inputs logits.

I personally see no reason to include the auto option and would vote for option 2.

input_format: str specifying the format of the input preds tensor. Can be one of:

    - ``'auto'``: automatically detect the format based on the values in the tensor. If all values
        are in the [0,1] range, we consider the tensor to be probabilities and only thresholds the values.
        If all values are non-float we consider the tensor to be labels and does nothing. Else we consider the
        tensor to be logits and will apply sigmoid to the tensor and threshold the values.
    - ``'probs'``: preds tensor contains values in the [0,1] range and is considered to be probabilities. Only
        thresholding will be applied to the tensor and values will be checked to be in [0,1] range.
    - ``'logits'``: preds tensor contains values outside the [0,1] range and is considered to be logits. We
        will apply sigmoid to the tensor and threshold the values before calculating the metric.
    - ``'labels'``: preds tensor contains integer values and is considered to be labels. No formatting will be
        applied to preds tensor.
    - ``'none'``: will disable all input formatting. This is the fastest option but also the least safe.

Feb 03 '24 15:02 idc9

@idc9 thanks for giving your opinions on this issue. I already laid out on slack some of my opinions, but just for transparency here are the two main reasons for having auto as the default:

To keep everything backwards compatible
To say "The auto option will always be buggy if the user inputs logits." is simply wrong. For reasonably sized input, any model should really output logit values outside the [0,1] range which after sigmoid transformation corresponds to probabilities in the [0.5, 0.73] range. If a model is only outputting values in the 0.5-0.73 probability range, this will mean it is never going to predict the negative class, and it is never confident about the positive class. Both seems highly unlikely for any well trained model.

Feb 04 '24 13:02 SkafteNicki

@SkafteNicki In cases where GPU memory limits batch sizes to be very small, there will be no "reasonably-sized input", so "auto" will be buggy. What about not having "auto", but rather having only "probs" and "logits", with "probs" being the default and verifying that its input is in [0, 1]? This will be backwards compatible, except raise an error in the case that the old version was doing the wrong thing. The error message can suggest the user might mean input_format = "logits".

Feb 05 '24 19:02 NoahAtKintsugi

Sorry I should have been more careful with my language! Let me rephrase “The auto option will always be buggy if the user inputs logits” to “The ‘auto’ option leaves open the possibility for the bug to happen when the user inputs logits”.

Since the auto option leaves open the possibility of this bug occurring I suspect we don’t we want to preserve backwards compatibility. This is a subtle/unexpected issue many users won’t immediately realize it can be an issue. Given it’s straightforward for the user to specify logits/prob, I’m curious if there is any case where ‘auto’ is useful (i.e. should it just be removed)?

As @SkafteNicki pointed out the logits bug will only occur if all of the predicted probabilities lie in [0.5, 0.73]. It’s true this is likely not the usual scenario but it probably does come up. A few examples where it may be more common include: small batches, early phases of model training, difficult applications where hold-out probabilities are (sadly) closeish to 0.5, unbalanced classes, situations where your label prediction rule thresholds the probability at some value different from 0.5.

Feb 06 '24 13:02 idc9

@NoahAtKintsugi and @idc9 Alright, I am willing to give me on this, but we are then going to break backwards compatibility, which I do not take as a light issue. If we do this change we need to do it right. You are probably right that it is better to break stuff now and then fix this issue that may be a potential hidden problem for some users.

@Borda and @justusschock please give some opinions on this. Do we

continue with input_format="auto" as the default keeping everything backwards compatibility
set input_format="auto" for v1.4 (next release), include user warning that it will be changing to input_format="logits"/"probs" (which one we choose does not matter to me) in v1.5
something else...

Feb 06 '24 13:02 SkafteNicki

set input_format="auto" for v1.4 (next release), include user warning that it will be changing to input_format="logits"/"probs" (which one we choose does not matter to me) in v1.5

I think due to stability, we may add a warning but being having a hard switch from 2.0 cc: @lantiga

Feb 15 '24 19:02 Borda

Defaulting to logits might make the most sense. I suspect is that it's more common for users to default to outputting logits than outputting probabilities (e.g. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html, https://github.com/huggingface/pytorch-image-models/blob/main/train.py, https://github.com/pytorch/examples/blob/main/imagenet/main.py) probably since this avoids a small bit of extra computation.

Feb 20 '24 03:02 idc9

@SkafteNicki, how is this one going? :)

Mar 28 '24 21:03 Borda

torchmetrics torchmetrics copied to clipboard

Classification: option to disable input formatting [wip]

What does this PR do?

Did you have fun?

Codecov Report

torchmetrics
torchmetrics copied to clipboard