ConvLab-2
ConvLab-2 copied to clipboard
RULEDST evaluation
Hi, could you please provide more information on how the Rule DST module is evaluated? Thanks
We did not evaluate the rule DST solely since it needs dialog acts as input. If you want to compare rule DST with other DST models, you may use the golden dialog acts as input or use an NLU model such as BERTNLU to parse both user and system acts.
I would like to use the output of BERTNLU as the input for the dst; however, it is not clear for me how to pass the data from one module to another, and I haven't find any code for that in convlab, for the moment.
Could you kindly link the convlab's page where this is described, or provide me more information about this process?
You can refer to the Colab tutorial or the interface class for nlu and dst. You can see PipelineAgent for how to build an agent with modules. Example usage: https://github.com/thu-coai/ConvLab-2/blob/master/tests/test_BERTNLU-RuleDST-RulePolicy-TemplateNLG.py
Thank you for the info. Nevertheless, the Colab tutorial refers to an overall evaluation (nlu + dst+ nlg). What if I would like to evaluate the nlu+dst only, in order to analyze if the defined rules are ok or need some improvements? Is that possible? Thanks again
Sure. Just feed the output of NLU to DST:
https://github.com/thu-coai/ConvLab-2/blob/ad32b76022fa29cbc2f24cbefbb855b60492985e/convlab2/dialog_agent/agent.py#L122-L132
From the code you posted it doesn't seem that the module is evaluated with F1 scores or a similar measure... perhaps I don't understand your point...
Sorry, I thought you need instruction about how to pass the output of NLU to DST. If you want to evaluate NLU+DST, you can write a script to: 1) read the original data; 2) pass utterances to NLU to get the user dialog acts; 3) pass user dialog acts to RuleDST to get predicted state; 4) compare predictions with references
refer to https://github.com/thu-coai/ConvLab-2/blob/master/convlab2/dst/evaluate.py for dst metric
Ok, thanks, I'll try in this way!