tokenizer topic

List tokenizer repositories

EduNLP

49
Stars
18
Forks
Watchers

A library for advanced Natural Language Processing towards multi-modal educational items.

Text-Classification-LSTMs-PyTorch

62
Stars
21
Forks
Watchers

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be...

goselect

28
Stars
2
Forks
Watchers

SQL like 'select' interface for files

htmldoc

21
Stars
5
Forks
Watchers

A token based HTML Document parser and minifier written in PHP. Extract attribute values and text using CSS selectors.

toktok

31
Stars
0
Forks
31
Watchers

Generic tokenizer written in Nim language 👑 Powered by std/lexbase and Nim's Macros

lexr

16
Stars
0
Forks
Watchers

Lexical analyzer for Javascript developers

tokenizer

45
Stars
11
Forks
Watchers

A simple tokenizer in Ruby for NLP tasks.

uax29

51
Stars
3
Forks
Watchers

A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.

gotokenizer

18
Stars
7
Forks
Watchers

A tokenizer based on the dictionary and Bigram language models for Go. (Now only support chinese segmentation)

strtok3

29
Stars
11
Forks
Watchers

Promise based streaming tokenizer