torchmetrics icon indicating copy to clipboard operation
torchmetrics copied to clipboard

Add Precision-Recall-Gain curve, Area Under Precision Recall Gain curve, and FGain1 score

Open siemdejong opened this issue 1 year ago • 5 comments

🚀 Feature

Add Precision-Recall-Gain (PRG) curve as a new feature with the same interface as the Precision-Recall (PR) curve.

Along with PRG, the Area Under the Precision Recall Gain curve (AUPRG) can be calculated, like is done AveragePrecision.

The FGain1 score (FG1) is the F1 score, but transformed such that it is the minor diagonal in PRG-space. This could be added.

Motivation

The PR curve has some caveats as described in [1]. PRG aims to fix these problems:

  1. baselines are non-universal
  2. interpolation is non-linear
  3. F-isometrics are non-linear
  4. Pareto-front is non-convex
  5. Area under PR curve does not relate to the expected F + there is an unachievable region

In particular, the area under the PR curve is demonstrated to sometimes favour models that result in lower F1-scores. The PRG curve will ultimately result in better model selection.

Pitch

A Torchmetrics implementation of the PRG curve that has the same interface as the PR curve would aid in better model selection.

>>> pred = torch.tensor([0, 0.1, 0.8, 0.4])
>>> target = torch.tensor([0, 1, 1, 0])
>>> prg_curve = PrecisionRecallGainCurve(task="binary")
>>> precision_gain, recall_gain, thresholds = prg_curve(pred, target)
>>> precision_gain
tensor([1.0000, 0.0000, 0.5000, 0.0000])
>>> recall_gain
tensor([0.0000,   0.0000,   1.0000,   1.0000])
>>> thresholds
...

Precision-Gain (PG) and Recall-Gain (RG) can be calculated as

$$ PG = 1 - \frac{tp + fn}{fp + tn} \cdot \frac{fp}{tp}, $$

and

$$ RG = 1 - \frac{tp + fn}{fp + tn} \cdot \frac{fn}{tp}. $$

AUPRG can be calculated as done with AveragePrecision, but only accounting for the area in PR & RG $\in [0, 1]$.

FG1 can be calculated as

$$ FG_1 = \frac{1}{2} PG + \frac{1}{2} RG. $$

It would be even more awesome if PRG can be extended to the multiclass/multilabel case.

Alternatives

The original authors of [1] have developed a package, pyprg (which is out-of-date with dependencies).

pip instal pyprg

Then,

from prg import prg
prg_curve = prg.create_prg_curve(labels=targets, scores=prediction)
precision_gain = prg_curve["precision_gain"]
recall_gain = prg_curve["recall_gain"]
auprg= prg.calc_auprg(prg_curve)

Additional context

[1] Flach & Kull. http://people.cs.bris.ac.uk/~flach/PRGcurves/PRcurves.pdf

siemdejong avatar May 11 '23 14:05 siemdejong

Hi! thanks for your contribution!, great first issue!

github-actions[bot] avatar May 11 '23 14:05 github-actions[bot]

Hi @siemdejong, thanks for raising this issue. A couple of questions maybe:

  • How commonly is this metric used? I have not heard or seen it before in any papers I read.
  • It is good that there is a package to compare against if we make our own implementation, but then I see issues like https://github.com/meeliskull/prg/issues/7 and wonder how stable the implementation is?

SkafteNicki avatar May 12 '23 05:05 SkafteNicki

  • The metric is not (yet) commonly used. Obvious reasons might be that 1) people simply do not know about it, 2) it takes an extra step to calculate the plot, 3) no good implementation is available.
  • I have not tested the implementation thoroughly, so I cannot make arguments on the stability of the official implementation.

For another writeup about gain metrics, see https://snorkel.ai/improving-upon-precision-recall-and-f1-with-gain-metrics/

Maybe an interesting discussion on scikit-learn and gain metrics: https://github.com/scikit-learn/scikit-learn/pull/24121

siemdejong avatar May 12 '23 06:05 siemdejong

Hi, can I contribute in this issue?

arijitde92 avatar May 25 '23 10:05 arijitde92

Hi @arijitde92, Feel free to make a contribution on this topic :)

SkafteNicki avatar May 25 '23 12:05 SkafteNicki