tokenizers topic

List tokenizers repositories

ginza-transformers

17
Stars
4
Forks
Watchers

Use custom tokenizers in spacy-transformers

xef

164
Stars
16
Forks
Watchers

Building applications with LLMs through composability, in Kotlin, Scala, ...

count-tokens-hf-datasets

22
Stars
1
Forks
Watchers

This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.

magnet

19
Stars
1
Forks
Watchers

the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly

LongRoPE

82
Stars
8
Forks
Watchers

Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper