text-tokenization topic

List text-tokenization repositories

tokenmonster

549
Stars
19
Forks
Watchers

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

split-markdown4gpt

20
Stars
2
Forks
Watchers

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows t...