tokenizer topic

List tokenizer repositories

friso

473
Stars
94
Forks
Watchers

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...

jargon

103
Stars
1
Forks
Watchers

Tokenizers and lemmatizers for Go

json

23
Stars
3
Forks
Watchers

🔋 In-place lightweight JSON parser

lex

136
Stars
9
Forks
Watchers

Replaced by foonathan/lexy

cang-jie

75
Stars
20
Forks
Watchers

Chinese tokenizer for tantivy, based on jieba-rs

rustfst

139
Stars
16
Forks
Watchers

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

tokenizer

5.1k
Stars
23
Forks
Watchers

A small library for converting tokenized PHP source code into XML (and potentially other formats)

simplemma

130
Stars
10
Forks
Watchers

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

liblex

28
Stars
4
Forks
Watchers

C library for Lexical Analysis

SNL-Compiler

35
Stars
6
Forks
Watchers

SNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析