tokenizer topic
kanpyo
Japanese Morphological Analyzer written in Rust
com.doji.transformers
A Unity package to run pretrained transformer models with Unity Sentis
tokenizers
a lightweight no-dependency fork from transformers.js (only tokenizers)
Amharic-Tokenizer
Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.
bge-m3-onnx
ONNX implementation of the BGE-M3 multilingual embedding model and tokenizer with native C#, Java, and Python implementations. Generates all three embedding types: dense, sparse, and ColBERT vectors.
Deepdive-llama3-from-scratch
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
zon-TS
ZON → 35-70% cheaper LLM prompts than JSON/TOON. Zero overhead.
Tiny-Lua-Compiler
⛄Possibly the smallest Lua compiler ever
MathParser.lua
An elegant Math Evaluator written in Lua, featuring support for adding custom operators and functions