tokenizer topic

List tokenizer repositories

python-mecab

28
Stars
7
Forks
Watchers

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

esperanto-analyzer

28
Stars
1
Forks
Watchers

Morphological and syntactic analysis of Esperanto sentences

transphone

157
Stars
16
Forks
Watchers

phoneme tokenizer and grapheme-to-phoneme model for 8k languages

html-lexer

24
Stars
4
Forks
Watchers

HTML5 compliant lexer

Lisp-esque-language

27
Stars
2
Forks
Watchers

💠The Lel programming language

guide-to-interpreters-series

183
Stars
29
Forks
Watchers

Contains source-code for viewers following along with my Beginners Guide To Building Interpreters series on my Youtube Channel.

Loretta

117
Stars
11
Forks
Watchers

A C# Lua, GLua and Luau parser, code analysis, transformation and generation library.

python-vibrato

34
Stars
1
Forks
Watchers

Viterbi-based accelerated tokenizer (Python wrapper)

tiptap-annotation-magic

25
Stars
0
Forks
Watchers

An extension for the Tiptap editor, enabling the annotation of text. Comes with support for overlapping annotations, useful for e.g. NLP tokenization.

tivars_lib_cpp

25
Stars
7
Forks
Watchers

A C++ library to interact with TI-z80 (82/83/84 series) calculators files (programs, lists, matrices, etc.)