SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

Average precision metric for binary classification

Open denmoroz opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe. There is ROC AUC in ComputeModelStatistics, but at the same time Average Precision (areaUnderPR) is absent.

Describe the solution you'd like It will be awesome to add it as it is very useful for many binary classification tasks.

Additional context None.

AB#1789611

denmoroz avatar May 12 '22 15:05 denmoroz

I think this already supported as the name "AUC" but tagging in @imatiach-msft for exact details

mhamilton723 avatar May 13 '22 21:05 mhamilton723

Hmm, getAUC calls areaUnderROC not areaUnderPR, so it should be ROC AUC not PR AUC.

Also it not clear what the difference between areaUnderROC (MetricConstants.AreaUnderROCMetric) and AUC (MetricConstants.AucSparkMetric) as getAUC internally calls areaUnderROC from spark.mllib.evaluation.BinaryClassificationMetrics.

These two are just synonyms for ROC AUC?

denmoroz avatar May 14 '22 06:05 denmoroz

"Also it not clear what the difference between areaUnderROC (MetricConstants.AreaUnderROCMetric) and AUC (MetricConstants.AucSparkMetric) as getAUC internally calls areaUnderROC from spark.mllib.evaluation.BinaryClassificationMetrics."

Yes, indeed, they are synonymous.

"Hmm, getAUC calls areaUnderROC not areaUnderPR, so it should be ROC AUC not PR AUC."

Perhaps we can rename this. Honestly "PR AUC" is used a lot less often than ROC AUC as I've personally seen. How would you prefer us to rename these? They were determined several years ago for reasons that are no longer relevant (similarity to another Microsoft ML platform's metric names).

Usually when I see AUC (Area Under Curve) I assume it's for ROC (Receiver Operating Characteristic) already, PR AUC is used less often.

imatiach-msft avatar May 17 '22 16:05 imatiach-msft

Perhaps we can rename this

There is a common agreement in community that AUC = ROC AUC (areaUnderROC in spark terms), so probably no need to rename anything. Instead it will be nice to add PR AUC (areaUnderPR in spark terms) and name it as AP (average precision) for instance (naming is not my best 😓 ). At least it then will follow LightGBM metrics naming: image

Usually when I see AUC (Area Under Curve) I assume it's for ROC (Receiver Operating Characteristic) already, PR AUC is used less often.

Exactly!

Honestly "PR AUC" is used a lot less often than ROC AUC as I've personally seen.

Indeed, but it depends on task you solve. ROC is a balance between TPR and FPR while PR is Precision - Recall balance. It may help with highly-imbalanced datasets. You might have ROC AUC close to 1.0 but with practically zero recall at the same time. Whereas ROC PR is much more useful for such tasks.

denmoroz avatar May 17 '22 16:05 denmoroz

@imatiach-msft i think area under PR is much better for unbalanced tasks

mhamilton723 avatar May 18 '22 17:05 mhamilton723

@denmoroz -- if you're satisfied with the response, can you please close the issue ?

ppruthi avatar Jul 20 '22 19:07 ppruthi

@ppruthi 👋 Sorry, it is still unclear to me from the above conversation whether this feature will be implemented or it will not. I can surely close it if it is not in plans anytime soon.

denmoroz avatar Aug 01 '22 11:08 denmoroz