RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

Question about the classifier used for IntentAccuracyDailyDialog.

Open zhangjf-nlp opened this issue 2 months ago • 0 comments

According to the source code of class IntentAccuracyDailyDialog(BaseMetric), the intent likelihood of utterances on DailyDialog is computed by rajkumarrrk/roberta-daily-dialog-intent-classifier.

However, according to the config.json of this classifier, it is used for emotion classification, with four labels: joy, optimism, anger, and sadness, while the intent labels on DailyDialog should be Inform, Questions, Directives, and Commissive instead.

So my question is: Is this classifier already fine-tuned on intent classification of DailyDialog utterances?

Empirically, i obeserve that the classification results of ground truth utterances in DailyDialog by this classifier are unbalanced and not well-aligned to the labelled intent distribution, as shown below.

  • classification results on test set
label-0 label-1 label-2 label-3 Intent Accuracy
classification on ground truth 0.7102 0.0055 0.0275 0.2071 0.6147
intent labels in DailyDialog 0.4988 0.2231 0.1565 0.1213 -
classification on SFT generation 0.5363 0.1591 0.0944 0.2100 0.4034

zhangjf-nlp avatar Apr 24 '24 11:04 zhangjf-nlp