nlp-for-sanskrit
nlp-for-sanskrit copied to clipboard
State of the Art Language models and Classifier for Sanskrit language (ancient indian language)
NLP for Sanskrit
This repository contains State of the Art Language models and Classifier for Sanskrit, which is an ancient Indian language.
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Dataset
Created as part of this project
Results
Language Model Perplexity
Architecture/Dataset | Sanskrit Wikipedia Articles |
---|---|
ULMFiT | ~6 |
TransformerXL | ~3 |
Classification Metrics
ULMFiT
Dataset | Accuracy | Kappa Score |
---|---|---|
Sanskrit Shlokas Dataset | 84.3 | 76.1 |
Visualizations
Embedding Space
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Pretrained Language Model
Download pretrained Language Model from here
Classifier
Download classifier from here
Tokenizer
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here