tokenizers
tokenizers copied to clipboard
Pre-trainined German tokenizers for BPE or Subword embeddings?
I have been following this guide, https://huggingface.co/transformers/tokenizer_summary.html . For example, I would like to implement XLNetTokenizer to generate subwords. The guide shows for English, but is there an equivalent for German?
from transformers import XLNetTokenizer
tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased")
Hi @Yuvaraj91 you can check out all possible models available (and their associated tokenizers) here :
https://huggingface.co/models?language=de&sort=downloads
You can sort on the left on langage, does that work for you ?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.