tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Pre-trainined German tokenizers for BPE or Subword embeddings?

Open yuvaraj91 opened this issue 4 years ago • 1 comments

I have been following this guide, https://huggingface.co/transformers/tokenizer_summary.html . For example, I would like to implement XLNetTokenizer to generate subwords. The guide shows for English, but is there an equivalent for German?

from transformers import XLNetTokenizer
tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased")

yuvaraj91 avatar Oct 17 '21 08:10 yuvaraj91

Hi @Yuvaraj91 you can check out all possible models available (and their associated tokenizers) here :

https://huggingface.co/models?language=de&sort=downloads

You can sort on the left on langage, does that work for you ?

Narsil avatar Oct 18 '21 07:10 Narsil

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Mar 12 '24 01:03 github-actions[bot]