tokenization topic
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
textoken
Simple and customizable text tokenization gem.
spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
databunker
Secure SDK/vault for personal records/PII built to comply with GDPR
spacy-streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
lunasec
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...
ClangKit
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
sudachi.rs
Sudachi in Rust 🦀 and new generation of SudachiPy
Ravencoin
Ravencoin Core integration/staging tree
codechain
CodeChain's official implementation in Rust.