tokenizer topic

List tokenizer repositories

Neural-Net-Zero-to-Hero-with-Andrej

97
Stars
14
Forks
Watchers

This repository contains the collection of explorative notebooks pure in python and in the language that we, humans can read. Have tried to compile all lectures from the Andrej Karpathy's 💎 playlist...

character-tokenizer

29
Stars
13
Forks
29
Watchers

A character tokenizer for Hugging Face Transformers

MambaByte

108
Stars
6
Forks
Watchers

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

AC

18
Stars
0
Forks
Watchers

AC自动机 文本相似检索 词库匹配 分词器

FileQL

71
Stars
2
Forks
Watchers

A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.

ChatGPT-Token-Usage-Pre-Calculator

15
Stars
4
Forks
Watchers

Perfect for anyone who needs to quickly calculate the token amount of ChatGPT in prompts for their project.

llama3-tokenizer-js

117
Stars
6
Forks
117
Watchers

JS tokenizer for LLaMA 3 and LLaMA 3.1

AWS-LLM-SageMaker

16
Stars
2
Forks
16
Watchers

SageMaker Ployglot based RAG opensearch

Tokenizer

120
Stars
26
Forks
Watchers

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

JSRETK

19
Stars
2
Forks
Watchers

JavaScript Reverse Engineering Toolkit (JSRETK) - Experimental tools for analyzing (minified/obfuscated) JavaScript