tokenization topic

List tokenization repositories

simplemma

130
Stars
10
Forks
Watchers

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

textoken

31
Stars
3
Forks
Watchers

Simple and customizable text tokenization gem.

spaCy

29.0k
Stars
4.3k
Forks
Watchers

💫 Industrial-strength Natural Language Processing (NLP) in Python

databunker

1.2k
Stars
70
Forks
Watchers

Secure SDK/vault for personal records/PII built to comply with GDPR

spacy-streamlit

770
Stars
115
Forks
Watchers

👑 spaCy building blocks and visualizers for Streamlit apps

lunasec

1.4k
Stars
162
Forks
Watchers

LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...

ClangKit

359
Stars
46
Forks
Watchers

ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.

sudachi.rs

276
Stars
32
Forks
Watchers

Sudachi in Rust 🦀 and new generation of SudachiPy

Ravencoin

1.1k
Stars
670
Forks
Watchers

Ravencoin Core integration/staging tree

codechain

258
Stars
56
Forks
Watchers

CodeChain's official implementation in Rust.