DAMO-ConvAI
DAMO-ConvAI copied to clipboard
Questions for reproducing/ comparing with SpokenWOZ baselines.
Hello,
I am currently trying to evaluate models that I trained on SpokenWOZ in order to compare to the baselines you reported in the paper. Doing this, I'm currently running into some issues:
- Which evaluation script should be used to report the results? I'm currently using this script from space-word and I'm failing to get numbers close to the ones you report (20% less than what you report for inform and success, while reaching higher BLEU score). Also, which settings do you use exactly for the final evaluation (e.g. how do you set
same_eval_as_cambridge
anduse_true_domain_for_ctr_eval
)? - Do you have the outputs or the trained model parameters of any of the baseline models available somewhere to verify the evaluation procedure (I need to adapt it to fit to my code base and want to check that I get the same results as you)?
- When trying to run the training of
space-word
myself, the training runs for a few iterations and then crashes because thenpy
-fileSNG1724
is missing. I see that the dialog exists in the original data, but it is not preprocessed correctly for some reason. Do you have an explanation for this?