ConvLab-2 icon indicating copy to clipboard operation
ConvLab-2 copied to clipboard

[Feature] Update MultiWOZ dataset from 2.1 to 2.2

Open derekchen14 opened this issue 4 years ago • 11 comments

Given the release of MultiWoZ 2.2, it seems like the baselines should all be retrained using the cleanest version of the dataset. Paper: https://www.aclweb.org/anthology/2020.nlp4convai-1.13/

derekchen14 avatar Jul 16 '20 20:07 derekchen14

Thanks! We've noticed MultiWOZ 2.2. We will add it if it is of high quality

zqwerty avatar Jul 17 '20 01:07 zqwerty

Also would be great to support the new format (which will also make it easy to add SGD).

chris-boson avatar Jan 02 '21 10:01 chris-boson

We are planning to add many datasets (SchemaGuided, Taskmaster, etc.) using a unified format.

zqwerty avatar Jan 08 '21 03:01 zqwerty

great that you're planning to add SGD and Taskmaster, any updates on when that will be available

tomolopolis avatar Mar 09 '21 16:03 tomolopolis

Actually, we have processed SGD, Taskmaster, and other datasets. We will update them with MultiWOZ 2.2 & 2.3 in few days. Thanks!

zqwerty avatar Mar 10 '21 07:03 zqwerty

great stuff - looking forward to it!

tomolopolis avatar Mar 10 '21 09:03 tomolopolis

@tomolopolis SGD and Taskmaster are available in unified format #180.

zqwerty avatar Mar 10 '21 11:03 zqwerty

@zqwerty thanks for that, are there plans to replicate (some) of the existing supported model implementations to use the unified format? then have the various datasets configurable in each model, given the consistent format?

For example some new modules might be: convlab2/nlu/jointBERT/unified/nlu.py convlab2/dst/comer/unified/dst.py convlab2/policy/gdpl/unified/policy.py convlab2/nlg/sclstm/unified/nlg.py ...

tomolopolis avatar Mar 12 '21 16:03 tomolopolis

@tomolopolis we will modify the unified data process and support some of the useful models. However, some models have a lot of dataset-specific processes which can not be well unified.

zqwerty avatar Mar 13 '21 07:03 zqwerty

@tomolopolis we have added multiwoz 2.2 and multiwoz-coref. Check 34960ff in master. However, I deleted the previous commit in order to remove git lfs due to the limited bandwidth for download. I've noticed that you have merged the previous pull-request. Hope that will not bother you too much.

zqwerty avatar Mar 14 '21 09:03 zqwerty

@zqwerty Thanks for adding those. No worries about deleting the previous commit, I can pull in the latest

tomolopolis avatar Mar 14 '21 10:03 tomolopolis