tokenizer topic
lexertk
C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
MicroTokenizer
一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
chevrotain
Parser Building Toolkit for JavaScript
natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
kagome
Self-contained Japanese Morphological Analyzer written in pure Go
open-korean-text
Open Korean Text Processor - An Open-source Korean Text Processor
text2text
Text2Text: Crosslingual NLP/G toolkit
php-parser
:herb: NodeJS PHP Parser - extract AST or tokens
cogcomp-nlp
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, t...
nagisa
A Japanese tokenizer based on recurrent neural networks