transformer-xl character level representation

character level representation

Open goaaron opened this issue 5 years ago • 0 comments

I initially want to train T-xl at the character level on my own database a la text8 where individual words in the dataset are unrelated and separated by a special character

Later i want to be able to train this on another dataset where the words are related and separated by another special character (I don't have this dataset yet).

How can I configure the initial training such that this would be workable? Would adding <eos> in the tokenizer when i hit a separator character be ok?

May 24 '19 18:05 goaaron

transformer-xl transformer-xl copied to clipboard

character level representation

transformer-xl
transformer-xl copied to clipboard