tokenizer topic
Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
csslex
A very small and very fast spec compliant css lexer
ScoreTransformer
The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)
python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This b...
GPT3-Tokenizer
GPT3 encoder & decoder tool written in Swift
elasticsearch-plugins
Some native scoring script plugins for elasticsearch
ai21-tokenizer
AI21's Jurassic models tokenizers
maeel
The maeel programming language
nim-tokenizer
Implementation of a simple BPE tokenizer, but in Nim