OpenKiwi icon indicating copy to clipboard operation
OpenKiwi copied to clipboard

Do you need to tokenize your data when using a BERT/ROBERTA model?

Open zolastro opened this issue 2 years ago • 0 comments

Considering that these models have their own tokenization and BPE models, what is the format of the input files to train a QE model using any of this LM? Should you apply any kind of previous tokenization/casing model?

Thanks in advance for your help!

zolastro avatar Apr 25 '22 10:04 zolastro