tokenizers topic

List tokenizers repositories

ginza-transformers

16
Stars
5
Forks
Watchers

Use custom tokenizers in spacy-transformers

xef

174
Stars
15
Forks
Watchers

Building applications with LLMs through composability, in Kotlin, Scala, ...

count-tokens-hf-datasets

22
Stars
1
Forks
Watchers

This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.

magnet

26
Stars
2
Forks
Watchers

the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly

LongRoPE

82
Stars
8
Forks
Watchers

Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper