user-satisfaction-simulation
user-satisfaction-simulation copied to clipboard
different number of annotators
Hello, I was looking into human annotations for USS dataset and I realized different conversations are annotated by different number of annotators. May I know what is the reason and how the number of utterances with specific annotations ratings in the table have been calculated?
Hello,
The final score of a utterance is determined by the majority annotation. For example, a data sample with annotations (3, 3, 4) will translate to a score of 3, as 3 is the majority annotation.
We conduct additional labeling on entire conversations when inconsistencies (i.e., unable to determine majority) arise from initial annotators. Thus some data may receive more than 3 annotations.
thanks for the clarification! so in that case let's say for the first conversation from MWOZ that has labels 3,3,2 why did you add another annotator though the majority of the votes was clear?
Thanks for your question! We have added some additional annotations to the data provided by a select group of outlier annotators, i.e., those whose final annotation score distribution seem inconsistent with that of most annotators, such as being more likely to give high scores. We did not remove these outlier annotators due to some degree of subjectivity in the dialogue evaluation.