witch-language
witch-language copied to clipboard
Easy language identification of 380 languages
Witch-language
Massively multilingual, easy language identification. It currently works on 380 languages.
Prerequisites
Install NLTK:
pip3 install --user nltk
Download the UDHR2 dataset:
echo "import nltk; nltk.download('udhr2')" | python3
Usage
echo "Fufú kele madya ya bilûmbu nyonso na Afelika ya Kati." | python3 langid.py
Top 10 Guesses:
Koongo (kng): 0.519163
Kituba (Democratic Republic of Congo) (ktu): 0.479616
Lozi (loz): 0.000642021
Kaonde (kqn): 0.00050906
Nyamwezi (nym): 3.45846e-05
Luba-Lulua (lua): 2.30121e-05
Lingala (lin): 5.39564e-06
Bemba (Zambia) (bem): 2.98133e-06
Swahili (individual language) (swh): 2.55979e-06
Sukuma (suk): 6.94787e-07
You can add the --help
command-line flag to see more options.
Using Python3 gives much better performance than Python2 for this task.