OpenKiwi
OpenKiwi copied to clipboard
Do you need to tokenize your data when using a BERT/ROBERTA model?
Considering that these models have their own tokenization and BPE models, what is the format of the input files to train a QE model using any of this LM? Should you apply any kind of previous tokenization/casing model?
Thanks in advance for your help!