torchmetrics Fleiss Kappa

🚀 Feature

Fleiss Kappa

Motivation

Fleiss Kappa is a metric of inter-rater agreement between $k$ raters. It is useful in many areas for example: combining multiple measures or ensemble methods.

Pitch

Add Fleiss Kappa as metric. I implemented it myself a while ago, but think it might be a nice addition to torchmetrics: https://github.com/cemde/FleissKappa

I am happy to give it a try, make the metric more trochmetrics-like and do a PR

Alternatives

Additional context

Mar 29 '22 23:03 cemde

cool, @cemde are you willing to contribute this metric? :)

Mar 30 '22 01:03 Borda

@Borda I'll give it a go!

Mar 30 '22 10:03 wisecornelius

What should the design of the call signature be? For Cohen's kappa, the two raters are implemented through the preds and target variable. With Fleiss Kappa, we have N > 1 raters, so this is not possible. Further, in its nature, it is an unsupervised metric, which raises the question of the call signature for unsupervised metrics - I couldn't find any in torchmetrics. We only need preds, but it would maybe be good to have target as input as well, for compatability with other metrics in MetricCollections.

Apr 18 '22 14:04 cemde

If you're going in this direction, might be interesting to have Krippendorff's Alpha in mind as well. We chose it over Fleiss Kappa because of its ability to work with varying amounts of labelers per data point.

Not that I need it or anything, just as a note. We currently use Simpledorff for that.

May 08 '22 10:05 tsteffek

@wisecornelius can I give it a stab. In case you have not started working on It already?. cc: @Borda @SkafteNicki

Jul 18 '22 11:07 krishnakalyan3

@krishnakalyan3 @Borda I have the background code ready. I am just waiting for a response on the call signature to finish it up.

Jul 28 '22 19:07 cemde

I am just waiting for a response on the call signature to finish it up.

That would be great, just not sure what you mean by "call signature", like API?

Jul 30 '22 07:07 Borda

Most metrics are called with Metric.update(pred: torch.Tensor, target: torch.Tensor). This works for Cohens Kappa, because the we have two raters. One rater will be pred, one will be target. With Fleiss Kappa, we have K raters. Therefore, I suggest a function call like Metric.update(ratings: torch.Tensor) with rating having ... x K dimensions. As far as I can see, this is the first metric to deviate from the Metric.update(pred: torch.Tensor, target: torch.Tensor, ...) pattern.

Jul 30 '22 10:07 cemde

Hi @cemde, Sorry for being silent in regards to this issue.

I think it is fine for the call signature to be metric.update(ratings: torch.Tensor), since that is also what makes most sense to me :)

We just need to specify this in the documentation.

Aug 30 '22 22:08 SkafteNicki

torchmetrics torchmetrics copied to clipboard

Fleiss Kappa

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

torchmetrics
torchmetrics copied to clipboard