unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

matthews_correlation returning 0 on perfect correlation

Open yoavkatz opened this issue 1 year ago • 11 comments

Why is this the accepted behavior (strict=False was set a long time ago)?


The results of running the main metric in used in the card (matthews_correlation) over simulated predictions that are equal to the references returns a different score than expected. One would expect a perfect score of 1.0 in this case, but returned metric score was 0.0. This is flagged as only as a warning because strict=False was set in the call to test_card().The predictions passed to the metrics were: ['acceptable', 'acceptable', 'acceptable']


yoavkatz avatar Jan 01 '24 07:01 yoavkatz

reproducible via prepare.card.cola.py

dafnapension avatar Jan 07 '24 19:01 dafnapension

For the case in question, where predictions = references = ['acceptable', 'acceptable', 'acceptable'], by the book: MCC we only have TP here (or only TN), all three other components are 0, so the end result is 0.0

And an elaborated proof "by hand": mcc2

dafnapension avatar Jan 08 '24 21:01 dafnapension

The HF metric calls scikit-learn:

https://huggingface.co/spaces/evaluate-metric/matthews_correlation/blame/0da51560adeb410656ba31b4cd1807c990898398/matthews_correlation.py

from sklearn.metrics import matthews_corrcoef

def _compute(self, predictions, references, sample_weight=None): return { "matthews_correlation": float(matthews_corrcoef(references, predictions, sample_weight=sample_weight)), }

[ Dot it ](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html)

yoavkatz avatar Jan 09 '24 08:01 yoavkatz

I think the issue that in v=[0,0,0] or v=[1,1,1] - there is only a single class. This is a special case not treated in the implementation.

yoavkatz avatar Jan 09 '24 08:01 yoavkatz

This seems to be a known issue that has a PR , but was not fixed.

https://github.com/scikit-learn/scikit-learn/issues/25258

yoavkatz avatar Jan 09 '24 08:01 yoavkatz

scikit's implementation faithfully follows the definition (as there is only TN or only TP, and all other three components are 0, hence the result, by definition of matthew_coef, is 0). The question is whether for our case, when for testing a metric, we 'fake' full hit, or full miss, we should tweak the fake..

dafnapension avatar Jan 09 '24 08:01 dafnapension

Right. The metric is ill defined in this case 0/0. They suggest in the above issue to have a special flag for this, but they did solve this yet.

Can you repeat the above code with ref and pred each enumerating on (0,0),(0,1), (1,0), and (1,1) independently. I want to see all the corner cases.

pred. ref expected result (0,0) (1,1) 0 (1,1) (1,1) 1 (0,0) (0,0) 1 (1,1) (0,0) 0

yoavkatz avatar Jan 09 '24 08:01 yoavkatz

Gladly, I think that in all of your cases, there is only a single input term that is 2, and the tree three others == 0, so the nominator is 0 in all of your cases:

mcc_yoavs_corners

dafnapension avatar Jan 09 '24 09:01 dafnapension

Ok. So we should add a check, that if all the predictions are the same value (p), and all the references are the same value (r), we return 0 if p !=r and 1 if p=r.

Can you also check that all these are between 0 and 1?

(1,0) (1,1)
(0,1) (1,1)
(1,0) (0,0)
(0,1) (0,0)

yoavkatz avatar Jan 09 '24 11:01 yoavkatz

total loss for Matthews is -1, not 0:

mcc_yoavs_corners

I think that since Matthews returns 0, by definition, for any case that the nominator in the formula is 0, (namely: (either TP or TN is 0) and (either FP or FN is 0)), no matter how nice the predictions are, I suggest to add a warning message in such a case, rather than override Matthews.

dafnapension avatar Jan 09 '24 12:01 dafnapension

Yes. You are right - as this is correlation [1,0] and [0,1] are indeed anti-correlated (-1).

You can see what they did in f1 (and what they plan to do got Matthews) here:

https://github.com/scikit-learn/scikit-learn/pull/25531/files

zero_division : {"warn", 0.0, 1.0, np.nan}, default="warn" Sets the value to return when there is a zero division, i.e. when all predictions and labels are negative. If set to "warn", this acts as 0, but warnings are also raised. predictions and labels are negative.

However, we don't have use for warning. Noone sees them , as the results are stored and viewed in a report. So we can return np.nan - but it would be odd for a perfect prediction to return correlation of nan.

yoavkatz avatar Jan 09 '24 13:01 yoavkatz