tokenizer topic
EduNLP
A library for advanced Natural Language Processing towards multi-modal educational items.
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be...
goselect
SQL like 'select' interface for files
htmldoc
A token based HTML Document parser and minifier written in PHP. Extract attribute values and text using CSS selectors.
toktok
Generic tokenizer written in Nim language 👑 Powered by std/lexbase and Nim's Macros
lexr
Lexical analyzer for Javascript developers
tokenizer
A simple tokenizer in Ruby for NLP tasks.
uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.
gotokenizer
A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)