tokenizer topic

List tokenizer repositories

kanpyo

98
Stars
1
Forks
Watchers

Japanese Morphological Analyzer written in Rust

com.doji.transformers

26
Stars
3
Forks
26
Watchers

A Unity package to run pretrained transformer models with Unity Sentis

BBPE

37
Stars
3
Forks
37
Watchers

BBPE 底层实现

tokenizers

28
Stars
1
Forks
28
Watchers

a lightweight no-dependency fork from transformers.js (only tokenizers)

Amharic-Tokenizer

96
Stars
14
Forks
96
Watchers

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.

bge-m3-onnx

21
Stars
3
Forks
21
Watchers

ONNX implementation of the BGE-M3 multilingual embedding model and tokenizer with native C#, Java, and Python implementations. Generates all three embedding types: dense, sparse, and ColBERT vectors.

Deepdive-llama3-from-scratch

612
Stars
50
Forks
612
Watchers

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

zon-TS

41
Stars
3
Forks
41
Watchers

ZON → 35-70% cheaper LLM prompts than JSON/TOON. Zero overhead.

Tiny-Lua-Compiler

120
Stars
7
Forks
120
Watchers

⛄Possibly the smallest Lua compiler ever

MathParser.lua

19
Stars
2
Forks
19
Watchers

An elegant Math Evaluator written in Lua, featuring support for adding custom operators and functions