tokenization topic
trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
TokenScript
TokenScript schema, specs and paper
Python_Natural_Language_Processing
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand...
vtext
Simple NLP in Rust with Python bindings
datacamp-python-data-science-track
All the slides, accompanying code and exercises all stored in this repo. 🎈
NLP-Cube
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer