anomalib Metrics redesign

📝 Description

Adds Evaluator class which replaces MetricsCallback.
Similar to PostProcessor, the evaluator inherits from Callback to allow accessing the val and test outputs, and from nn.Module to store the metrics as a submodule in the Lightning model.
Adds AnomalibMetric class, which adds fields arg and attribute to existing metrics from torchmetrics. By inheriting from AnomalibMetric and a given Metric subclass, a new Anomalib-compatible version of the metric is created. When instantiating the metric, the user must specify the batch fields that will be used to update the metric. During update, we only need to pass the batch to the update method.

Some examples:

from anomalib.models import Padim
from anomalib.metrics import AnomalibMetric, AUROC, F1Score
from anomalib.metrics.base import create_anomalib_metric

# A model-specific default set of metrics is provided by anomalib:
# >>> Padim.default_evaluator()
# Evaluator(
#   (val_metrics): ModuleList()
#   (test_metrics): ModuleList(
#     (0): AUROC()
#     (1): F1Score()
#     (2): AUROC()
#     (3): F1Score()
#   )
# )

# But we can also pass our own set of metrics:
image_f1_score = F1Score(fields=["pred_label", "gt_label"])
image_auroc = AUROC(fields=["pred_label", "gt_label"])
Padim(metrics=[image_f1_score, image_auroc])

# When passing multiple metrics of the same type, we need to provide a prefix:
image_f1_score = F1Score(fields=["pred_label", "gt_label"], prefix="image_")
pixel_f1_score = F1Score(fields=["pred_mask", "gt_mask"], prefix="pixel_")
Padim(metrics=[image_f1_score, pixel_f1_score])

# We can also use torchmetrics classes that are not available in Anomalib:
# Option 1: create a custom metric class
class AnomalibAccuracy(AnomalibMetric, Accuracy):
    pass
image_accuracy = AnomalibAccuracy(fields=["pred_label", "gt_label"])

# Option 2: use create_anomalib_metric util function
from torchmetrics import Accuracy
AnomalibAccuracy = create_anomalib_metric(Accuracy)
image_accuracy = AnomalibAccuracy(fields=["pred_label", "gt_label"])

Other changes:

pass **kwargs from lightning models to AnomalyModule base class

✨ Changes

Select what type of change your PR is:

[ ] 🐞 Bug fix (non-breaking change which fixes an issue)
[ ] 🔨 Refactor (non-breaking change which refactors the code base)
[x] 🚀 New feature (non-breaking change which adds functionality)
[x] 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] 📚 Documentation update
[ ] 🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

[x] 📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
[ ] 📚 I have made the necessary updates to the documentation (if applicable).
[ ] 🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

Sep 27 '24 17:09 djdameln

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Sep 27 '24 20:09 review-notebook-app[bot]

Codecov Report

Attention: Patch coverage is 94.19643% with 13 lines in your changes missing coverage. Please review.

Please upload report for BASE (feature/v2@95115f9). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/anomalib/metrics/evaluator.py	90.00%	5 Missing :warning:
src/anomalib/metrics/base.py	83.33%	4 Missing :warning:
.../anomalib/models/components/base/anomaly_module.py	84.21%	3 Missing :warning:
src/anomalib/data/validators/torch/image.py	83.33%	1 Missing :warning:

Additional details and impacted files

@@              Coverage Diff              @@
##             feature/v2    #2326   +/-   ##
=============================================
  Coverage              ?   78.56%           
=============================================
  Files                 ?      294           
  Lines                 ?    12878           
  Branches              ?        0           
=============================================
  Hits                  ?    10118           
  Misses                ?     2760           
  Partials              ?        0

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Sep 27 '24 20:09 codecov[bot]

@abc-125 @alexriedel1 @blaz-r @jpcbertoldo, if/when you have some time, would you like to review this PR? It introduces some major changes. It would be great to get your perspective.

Sep 29 '24 06:09 samet-akcay

I'll check this tomorrow in the morning.

Oct 01 '24 21:10 blaz-r

I think this looks great 😄. I added some small comments.

I do have two main things to point out:

Similar as the one from Ashwin - is there no way to pass the metrics just by name now (CLI, config)?

About the default evaluator inside anomaly_module. This change might not be completely compatible with current configs due to early stopping and behavior of existing validation (more details about this in the code comment).

@blaz-r, thanks for your review. This PR is part of a greater initiative, which might potentially break things again. That's why we started to think of this as anomalib v2. Here is a proposal showing how Anomalib base model would look like

Anomalib - Aux Operations Design.pptx

Oct 03 '24 05:10 samet-akcay

Cool, thanks for the shared info.

Oct 03 '24 09:10 blaz-r

Just on a side note, if you add AUPIMO to this, i would suggest a small change in the design.

Remove the option return_average that we added in the last minute, and instead create another classe AverageAUPIMO(AUPIMO).

It would have something like

def compute(...):
     _, aupimo_result = super().compute(...)
     # normal images have NaN AUPIMO scores
     is_nan = torch.isnan(aupimo_result.aupimos)
     return aupimo_result.aupimos[~is_nan].mean()

which is currently done in an if inside AUPIMO.

Oct 03 '24 14:10 jpcbertoldo

@samet-akcay @ashwinvaidya17 @blaz-r @jpcbertoldo I updated the PR based on your suggestions.

I simplified the AnomalyModule signature and logic by only allowing passing an Evaluator instance. To run the model without any evaluation, the user can pass False (as suggested by @samet-akcay).

I also moved the validation of the input metrics to the Evaluator class as suggested by @blaz-r.

Oct 25 '24 12:10 djdameln

Just on a side note, if you add AUPIMO to this, i would suggest a small change in the design.

Remove the option return_average that we added in the last minute, and instead create another classe AverageAUPIMO(AUPIMO).

It would have something like
def compute(...):
     _, aupimo_result = super().compute(...)
     # normal images have NaN AUPIMO scores
     is_nan = torch.isnan(aupimo_result.aupimos)
     return aupimo_result.aupimos[~is_nan].mean()
which is currently done in an if inside AUPIMO.

@jpcbertoldo Can you elaborate why this is needed? With this new design we still have access to any constructor arguments of the original metric, so the following would work:

aupimo = AUPIMO(fields=["anomaly_map", "gt_mask"], return_average=False)

Oct 29 '24 12:10 djdameln

@jpcbertoldo Can you elaborate why this is needed? With this new design we still have access to any constructor arguments of the original metric, so the following would work:
aupimo = AUPIMO(fields=["anomaly_map", "gt_mask"], return_average=False)

it's not needed : ) but it would be more obvious that one is using the average aupimo also would make a cleaner api IMO i suggested it here cuz it's already a redesign of metrics, so it seemed appropriate

Oct 30 '24 10:10 jpcbertoldo

I think it looks good. My main concern is whether fields
* are mandatory to instantiate a metric class

@samet-akcay

For most metrics, the fields are mandatory to instantiate the metric class. It's the only way to tell the metric which batch fields it should use to update its internal state. There can be some rare exceptions, such as PIMO, where the metric requires a fixed input type by design. In these cases the metric implementer can define the default fields for the metric as a class attribute, like I did here for AUPIMO.

Nov 04 '24 12:11 djdameln

anomalib anomalib copied to clipboard

Metrics redesign

📝 Description

✨ Changes

✅ Checklist

Codecov Report

anomalib
anomalib copied to clipboard