tokenizers topic
List
tokenizers repositories
ginza-transformers
16
Stars
5
Forks
Watchers
Use custom tokenizers in spacy-transformers
xef
174
Stars
15
Forks
Watchers
Building applications with LLMs through composability, in Kotlin, Scala, ...
count-tokens-hf-datasets
22
Stars
1
Forks
Watchers
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
magnet
26
Stars
2
Forks
Watchers
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
LongRoPE
82
Stars
8
Forks
Watchers
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper