UBAR-MultiWOZ The context used in evaluation

The context used in evaluation

Open 311dada opened this issue 3 years ago • 3 comments

The dialogue context should be in oracle or generated ?

Apr 15 '21 14:04 311dada

In a realistic setting, the context should be generated.

Apr 25 '21 04:04 TonyNemo

But there is something strange that happened in the evaluation with your given checkpoint. Specifically, in the Response Generation setup, I get the same result as issue #4. The weird thing is I get a somewhat lower result after giving the mode gold context (gold response). In particular, I use the following command

python train.py -mode test -cfg eval_load_path=$path use_true_prev_bspn=True use_true_prev_aspn=True use_true_db_pointer=True use_true_prev_resp=True use_true_curr_bspn=True use_true_curr_aspn=True use_all_previous_context=True cuda_device=0

And I get a result of match: 96.10 success: 90.80 bleu: 22.06 score: 115.51 on the test split. Am I wrong? Could you explain to me? Thanks for your help!

Apr 25 '21 11:04 311dada

The same question. @TonyNemo

Aug 25 '21 09:08 SkyAndCloud

UBAR-MultiWOZ UBAR-MultiWOZ copied to clipboard

The context used in evaluation

UBAR-MultiWOZ
UBAR-MultiWOZ copied to clipboard