Aditya Tiwari
Results
1
comments of
Aditya Tiwari
I also faced the same issue when trained using the ByteLevelBPETokenizer suggested in https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb#scrollTo=IMnymRDLe0hi Tokenizer training: ``` tokenizer = ByteLevelBPETokenizer() tokenizer.train_from_iterator(iterator=LIST_OF_STRINGS, vocab_size=52000, min_frequency=2, special_tokens=[ "", "", "", "", "", ])...