knowledge-driven-dialogue-lic2019
knowledge-driven-dialogue-lic2019 copied to clipboard
seq2seq missing file
./seq2seq/data/invalid_kg.json
and ./seq2seq/outputs/20190511/valid/2019-5-1-transformer-8w_step_9000.pt
seems missing. I hope it can be added, or could you tell me how to generate these files.
Moreover, I got an unrecognized arguments error when I tried to run preclean.py
by using the default config.
usage: preclean.py [-h] [--log_file LOG_FILE]
[--raw_train_file RAW_TRAIN_FILE]
[--raw_dev_file RAW_DEV_FILE] [--save_dir SAVE_DIR]
[--raw_test_file RAW_TEST_FILE]
preclean.py: error: unrecognized arguments: --train_sample_file_save data/train_sample.txt --dev_sample_file_save data/dev_sample.txt --test_sample_file_save data/test_sample.txt --train_text_file_save data/train.src --dev_text_file_save data/dev.src --test_text_file_save data/test.src --train_topic_file_save data/train_topic.txt --dev_topic_file_save data/dev_topic.txt --test_topic_file_save data/test_topic.txt --train_tgt_file data/train.tgt --dev_tgt_file data/dev.tgt --test_tgt_file data/test.tgt
And it could be run successfully when I change ./seq2seq/config/preclean.yml
as follow.
raw_train_file: "data/train.txt"
raw_dev_file: "data/dev.txt"
raw_test_file: "data/test.txt"
log_file: "outputs/log/log.txt"
save_dir: "data/"
Please check if your configuration is wrong or if I am missing any steps that cause this problem.
Thank you!
@AngusMonroe
- The invalid_kg.json denotes the knowledge that has been filtered by human-setting rules. More specifically, we filtered knowledge which is irrelevant or Not useful with current dialogue or response for the training set.
- The ./seq2seq/outputs/20190511/valid/2019-5-1-transformer-8w_step_9000 is the seq2seq model file that has been trained on the training set. I will upload it soon.
- The preclean.yml is for the seq2seq/preclean.py preclean_baidu.py preclean_baidu_aug.py, not just preclean.py. You can find the corresponding parameters in the function of preclen_opt() in each of them. The preclean_baidu.py preclean_baidu_aug.py is mainly used to produce data in our case rather than the preclean.py, so we could modify some config, which maybe not fit for the preclean.py anymore. Your change is right.