Optimus icon indicating copy to clipboard operation
Optimus copied to clipboard

DailyDialogue dataset

Open Rabona17 opened this issue 2 years ago • 4 comments

Where can I get the preprocessed dailydialog dataset used for spacefusion pretraining code? Any suggestion on how to preprocess the original dailydialog would be appreciated! Thanks

Rabona17 avatar Jul 29 '21 04:07 Rabona17

I don't have the spacefusion pre-training code. On dailydialog dataset, we keep the history of a fixed sequence length. We tried to follow the original paper setting:

https://github.com/golsun/SpaceFusion

ChunyuanLI avatar Jul 29 '21 05:07 ChunyuanLI

Thanks, so where can I get the daily dialog dataset you used in run_dialog_spacefusion.sh (../data/datasets/dailydialog_data/train.txt)? Or should I preprocess it myself?

Rabona17 avatar Jul 29 '21 05:07 Rabona17

I'm afraid you have to pre-process it on your own.

ChunyuanLI avatar Jul 29 '21 06:07 ChunyuanLI

Sure, so for DailyDialog, since spacefusion doesn't provide any preprocessing code for the dataset, what criteria did you use for src and trgt, or what procedure did you use to split the original dailydialog in to src and trgt? Thanks in advance!

Rabona17 avatar Jul 29 '21 06:07 Rabona17