knowledge-driven-dialogue-lic2019 icon indicating copy to clipboard operation
knowledge-driven-dialogue-lic2019 copied to clipboard

seq2seq missing file

Open AngusMonroe opened this issue 5 years ago • 1 comments

./seq2seq/data/invalid_kg.json and ./seq2seq/outputs/20190511/valid/2019-5-1-transformer-8w_step_9000.pt seems missing. I hope it can be added, or could you tell me how to generate these files.

Moreover, I got an unrecognized arguments error when I tried to run preclean.py by using the default config.

usage: preclean.py [-h] [--log_file LOG_FILE]
                   [--raw_train_file RAW_TRAIN_FILE]
                   [--raw_dev_file RAW_DEV_FILE] [--save_dir SAVE_DIR]
                   [--raw_test_file RAW_TEST_FILE]
preclean.py: error: unrecognized arguments: --train_sample_file_save data/train_sample.txt --dev_sample_file_save data/dev_sample.txt --test_sample_file_save data/test_sample.txt --train_text_file_save data/train.src --dev_text_file_save data/dev.src --test_text_file_save data/test.src --train_topic_file_save data/train_topic.txt --dev_topic_file_save data/dev_topic.txt --test_topic_file_save data/test_topic.txt --train_tgt_file data/train.tgt --dev_tgt_file data/dev.tgt --test_tgt_file data/test.tgt

And it could be run successfully when I change ./seq2seq/config/preclean.yml as follow.

raw_train_file: "data/train.txt"
raw_dev_file: "data/dev.txt"
raw_test_file: "data/test.txt"
log_file: "outputs/log/log.txt"
save_dir: "data/"

Please check if your configuration is wrong or if I am missing any steps that cause this problem.

Thank you!

AngusMonroe avatar Jul 22 '19 08:07 AngusMonroe

@AngusMonroe

  1. The invalid_kg.json denotes the knowledge that has been filtered by human-setting rules. More specifically, we filtered knowledge which is irrelevant or Not useful with current dialogue or response for the training set.
  2. The ./seq2seq/outputs/20190511/valid/2019-5-1-transformer-8w_step_9000 is the seq2seq model file that has been trained on the training set. I will upload it soon.
  3. The preclean.yml is for the seq2seq/preclean.py preclean_baidu.py preclean_baidu_aug.py, not just preclean.py. You can find the corresponding parameters in the function of preclen_opt() in each of them. The preclean_baidu.py preclean_baidu_aug.py is mainly used to produce data in our case rather than the preclean.py, so we could modify some config, which maybe not fit for the preclean.py anymore. Your change is right.

circlePi avatar Jul 23 '19 01:07 circlePi