multiwoz Evaluation Dataset in "Response Generation"

Evaluation Dataset in "Response Generation" - "end-to-end models" part

Open nqchieutb01 opened this issue 1 year ago • 1 comments

Is this part used multiwoz 2.2 or multiwoz 2.0 as a benchmark dataset. I'm so confused, in RewardNet, Mars, KRLS original paper, all results are the same as your table, but they all reported in multiwoz 2.0 dataset. Morever, in the TOATOD paper, authors reported combined score in multiwoz 2.2 dataset. Is there any mistakes. Can you explain this inconsistent. Thanks !

Sep 20 '23 03:09 nqchieutb01

Is this part used multiwoz 2.2 or multiwoz 2.0 as a benchmark dataset. I'm so confused, in RewardNet, Mars, KRLS original paper, all results are the same as your table, but they all reported in multiwoz 2.0 dataset. Morever, in the TOATOD paper, authors reported combined score in multiwoz 2.2 dataset. Is there any mistakes. Can you explain this inconsistent. Thanks !

I have the same doubt.

Nov 03 '23 16:11 comprehensiveMap

multiwoz multiwoz copied to clipboard

Evaluation Dataset in "Response Generation" - "end-to-end models" part

multiwoz
multiwoz copied to clipboard