nltk icon indicating copy to clipboard operation
nltk copied to clipboard

Incorrect krippendorffs alpha result with missing value or missing data

Open createmomo opened this issue 3 years ago • 3 comments

from nltk import agreement
rater1 = [1,1,2]
rater2 = [1,1,None]
rater3 = [None,1,2]

taskdata=[[0,str(i),str(rater1[i])] for i in range(0,len(rater1))]+[[1,str(i),str(rater2[i])] for i in range(0,len(rater2))]+[[2,str(i),str(rater3[i])] for i in range(0,len(rater3))]
print(taskdata) # (annotator_id, sample_id, label_id)
ratingtask = agreement.AnnotationTask(data=taskdata)

print("alpha " +str(ratingtask.alpha())) # krippendorffs alpha
alpha 0.33333333333333337

I am not sure if this result is correct. In my example, I assume I have 3 examples needed to annotate. Except the missing values (not every annotator annotate each example), all the annotators provided the same labels. I thought the alpha should be 1. However, I got 0.33.

Does anyone understand is 0.33 correct or 1.0 is correct for krippendorffs alpha?

Thank you!

createmomo avatar Oct 23 '21 15:10 createmomo

Hello!

Depending on your interpretation, both could be correct. Upon inspection it seems that None is taken as an observation. You can tell this by replacing None with e.g. "3":

from nltk.metrics.agreement import AnnotationTask

rater1 = [1, 1, 2]
rater2 = [1, 1, 3]
rater3 = [3, 1, 2]

taskdata = (
    [[0, str(i), str(rater1[i]) if rater1[i] else rater1[i]] for i in range(0, len(rater1))]
    + [[1, str(i), str(rater2[i]) if rater2[i] else rater2[i]] for i in range(0, len(rater2))]
    + [[2, str(i), str(rater3[i]) if rater3[i] else rater3[i]] for i in range(0, len(rater3))]
)
print(taskdata)  # (annotator_id, sample_id, label_id)
ratingtask = AnnotationTask(data=taskdata)

print("alpha " + str(ratingtask.alpha()))  # krippendorffs alpha

also outputs

alpha 0.33333333333333337

There is indeed an argument that None values should be ignored. This allows for partial annotating, as you mentioned. This would be as simply as modifying this line: https://github.com/nltk/nltk/blob/ad3c84c792453a44f2d195e5263e72b315774478/nltk/metrics/agreement.py#L310

to be

            label_freqs = FreqDist(x["labels"] for x in itemdata if x["labels"] is not None)

Running the aforementioned program with this line change results in:

alpha 1.0

That said, that would only allow missing annotations for the alpha method, and (perhaps) not for other metrics (i.e. pi). That's why I'm hesitant to make a pull request for this yet. I'm not very familiar with the agreement module, nor the research on the topic either. Perhaps someone has ideas on whether we should make changes here or not?

  • Tom Aarsen

tomaarsen avatar Oct 27 '21 19:10 tomaarsen

This issue is very much related to #2732, which asks how Agreement Krippendorff's alpha handles missing values.

tomaarsen avatar Oct 27 '21 19:10 tomaarsen

Hello, thank you so much for your kind response, and also the nice code demonstration! This is very helpful to me. Allowing missing data for alpha is exactly what I need.

As you mentioned, I also hope someone who has more experience on this topic could provide more ideas. :)

createmomo avatar Oct 27 '21 21:10 createmomo