tokenization topic

List tokenization repositories

php-text-analysis

515
Stars
88
Forks
Watchers

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

razdel

248
Stars
29
Forks
Watchers

Rule-based token, sentence segmentation for Russian language

vibrato

303
Stars
14
Forks
Watchers

🎤 vibrato: Viterbi-based accelerated tokenizer

l8w8jwt

130
Stars
39
Forks
Watchers

Minimal, OpenSSL-less and super lightweight JWT library written in C.

charformer-pytorch

119
Stars
9
Forks
Watchers

Implementation of the GBST block from the Charformer paper, in Pytorch

vaulty

61
Stars
9
Forks
Watchers

Tokenize, encrypt/decrypt, mask your data on the fly with Vaulty proxy

dlp-dataflow-deidentification

87
Stars
53
Forks
Watchers

Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP

wink-tokenizer

59
Stars
12
Forks
Watchers

Multilingual tokenizer that automatically tags each token with its type

TweebankNLP

100
Stars
8
Forks
Watchers

An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset

nlp-cheat-sheet-python

203
Stars
64
Forks
Watchers

NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition