easy_seq2seq
easy_seq2seq copied to clipboard
[Question] Its ok to replace train/test.enc/dec with a new dataset?
I still have a few questions about this module,
- Whats the correct way to know if the bot has finished learning the train.enc/dec? global perplexity near 0? or i should worry about bucket perplexity also?
- After i finished training one train.enc/dec its ok to replace it with a new one? I should worry that it will recreate the vocab or something like that?
Thanks again for making this module!
@kauegimenes with regards to perplexity:
it depends! a "good" perplexity score depends on whether youre learning from a closed or open domain.
as far as i can tell, for open language models, current state of the art perplexity is 24-30 (achieved by google in feb 2016). you can read more about "exploring the limits of language modeling" at https://arxiv.org/pdf/1602.02410.pdf
in other words, you wont reach 0 perplexity any time soon ;)
Can you share the datas? I run the pull_data.sh,but can't download datas. Thank you!
download files from this urls manually and place on data folder
https://www.dropbox.com/s/ncfa5t950gvtaeb/test.enc?dl=0 https://www.dropbox.com/s/48ro4759jaikque/test.dec?dl=0 https://www.dropbox.com/s/gu54ngk3xpwite4/train.enc?dl=0 https://www.dropbox.com/s/g3z2msjziqocndl/train.dec?dl=0