transformer-xl
transformer-xl copied to clipboard
Questions For Configuring a new Corpus
Hello,
I have few questions regarding adding and training new corpus:
Adding New Corpus:
-
Function "vocab.encode_file" second parameter is "ordered", what is the purpose of this parameter and when we should set it to false or true.
-
Function "vocab.encode_file" third and fourth parameters are "add_eos" and "add_double_eos", when we should use them ? for example in character datasets you didn't add any of them while in the "ptb" you add a single eos, and in in the "lm1b" you added two. Does that mean the character level datasets doesn't need eos or it is already included in the dataset, while it is not included in the "ptb" dataset ? But then why did you add two eos in "lm1b" dataset.
Training New Corpus:
-
Is it better to only use the "adaptive" parameter for adaptive softmax for word level only, but not for character level?
-
What is the difference between "tgt_len" "mem_len" and "eval_tgt_len" ? As far as I understood "tgt_len" is the maximum length for the vanilla transformer, "men_len" is the length of previous tokens to look into, and the "eval_tgt_len" is the extended length during evaluation, correct ?