DAMO-ConvAI icon indicating copy to clipboard operation
DAMO-ConvAI copied to clipboard

Questions for reproducing/ comparing with SpokenWOZ baselines.

Open ArneNx opened this issue 5 months ago • 8 comments

Hello,

I am currently trying to evaluate models that I trained on SpokenWOZ in order to compare to the baselines you reported in the paper. Doing this, I'm currently running into some issues:

  1. Which evaluation script should be used to report the results? I'm currently using this script from space-word and I'm failing to get numbers close to the ones you report (20% less than what you report for inform and success, while reaching higher BLEU score). Also, which settings do you use exactly for the final evaluation (e.g. how do you set same_eval_as_cambridge and use_true_domain_for_ctr_eval)?
  2. Do you have the outputs or the trained model parameters of any of the baseline models available somewhere to verify the evaluation procedure (I need to adapt it to fit to my code base and want to check that I get the same results as you)?
  3. When trying to run the training of space-word myself, the training runs for a few iterations and then crashes because the npy-file SNG1724 is missing. I see that the dialog exists in the original data, but it is not preprocessed correctly for some reason. Do you have an explanation for this?

ArneNx avatar Jan 22 '24 13:01 ArneNx