tokenizer topic

List tokenizer repositories

friso

472
Stars
94
Forks
Watchers

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...

jargon

103
Stars
1
Forks
Watchers

Tokenizers and lemmatizers for Go

json

23
Stars
3
Forks
Watchers

🔋 In-place lightweight JSON parser

lex

136
Stars
9
Forks
Watchers

Replaced by foonathan/lexy

cang-jie

74
Stars
20
Forks
Watchers

Chinese tokenizer for tantivy, based on jieba-rs

rustfst

138
Stars
16
Forks
Watchers

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

tokenizer

5.1k
Stars
23
Forks
Watchers

A small library for converting tokenized PHP source code into XML (and potentially other formats)

simplemma

126
Stars
9
Forks
Watchers

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

liblex

28
Stars
4
Forks
Watchers

C library for Lexical Analysis

SNL-Compiler

35
Stars
6
Forks
Watchers

SNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析