torchmetrics icon indicating copy to clipboard operation
torchmetrics copied to clipboard

Fleiss Kappa

Open cemde opened this issue 3 years ago • 9 comments

🚀 Feature

Fleiss Kappa

Wikipedia

Motivation

Fleiss Kappa is a metric of inter-rater agreement between $k$ raters. It is useful in many areas for example: combining multiple measures or ensemble methods.

Pitch

Add Fleiss Kappa as metric. I implemented it myself a while ago, but think it might be a nice addition to torchmetrics: https://github.com/cemde/FleissKappa

I am happy to give it a try, make the metric more trochmetrics-like and do a PR

Alternatives

Additional context

cemde avatar Mar 29 '22 23:03 cemde

cool, @cemde are you willing to contribute this metric? :)

Borda avatar Mar 30 '22 01:03 Borda

@Borda I'll give it a go!

wisecornelius avatar Mar 30 '22 10:03 wisecornelius

What should the design of the call signature be? For Cohen's kappa, the two raters are implemented through the preds and target variable. With Fleiss Kappa, we have N > 1 raters, so this is not possible. Further, in its nature, it is an unsupervised metric, which raises the question of the call signature for unsupervised metrics - I couldn't find any in torchmetrics. We only need preds, but it would maybe be good to have target as input as well, for compatability with other metrics in MetricCollections.

cemde avatar Apr 18 '22 14:04 cemde

If you're going in this direction, might be interesting to have Krippendorff's Alpha in mind as well. We chose it over Fleiss Kappa because of its ability to work with varying amounts of labelers per data point.

Not that I need it or anything, just as a note. We currently use Simpledorff for that.

tsteffek avatar May 08 '22 10:05 tsteffek

@wisecornelius can I give it a stab. In case you have not started working on It already?. cc: @Borda @SkafteNicki

krishnakalyan3 avatar Jul 18 '22 11:07 krishnakalyan3

@krishnakalyan3 @Borda I have the background code ready. I am just waiting for a response on the call signature to finish it up.

cemde avatar Jul 28 '22 19:07 cemde

I am just waiting for a response on the call signature to finish it up.

That would be great, just not sure what you mean by "call signature", like API?

Borda avatar Jul 30 '22 07:07 Borda

Most metrics are called with Metric.update(pred: torch.Tensor, target: torch.Tensor). This works for Cohens Kappa, because the we have two raters. One rater will be pred, one will be target. With Fleiss Kappa, we have K raters. Therefore, I suggest a function call like Metric.update(ratings: torch.Tensor) with rating having ... x K dimensions. As far as I can see, this is the first metric to deviate from the Metric.update(pred: torch.Tensor, target: torch.Tensor, ...) pattern.

cemde avatar Jul 30 '22 10:07 cemde

Hi @cemde, Sorry for being silent in regards to this issue.

I think it is fine for the call signature to be metric.update(ratings: torch.Tensor), since that is also what makes most sense to me :)

We just need to specify this in the documentation.

SkafteNicki avatar Aug 30 '22 22:08 SkafteNicki