transformer-xl icon indicating copy to clipboard operation
transformer-xl copied to clipboard

character level representation

Open goaaron opened this issue 5 years ago • 0 comments

I initially want to train T-xl at the character level on my own database a la text8 where individual words in the dataset are unrelated and separated by a special character

Later i want to be able to train this on another dataset where the words are related and separated by another special character (I don't have this dataset yet).

How can I configure the initial training such that this would be workable? Would adding <eos> in the tokenizer when i hit a separator character be ok?

goaaron avatar May 24 '19 18:05 goaaron