TurkicASR icon indicating copy to clipboard operation
TurkicASR copied to clipboard

A multilingual ASR model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek.

TurkicASR

This repository provides the recipe for the paper Multilingual Speech Recognition for Turkic Languages.

Pre-trained models

You can download the best performing models below.

model
turkic_languages_model.zip
all_languages_model.zip

Inference

To convert your audio file to text, please make sure it follows a wav format with sample rate of 16k. Unzip the pre-trained model in the current directory, and install the necessary packages by running pip install -r requirements.txt. To perform the evaluation please run:

python recognize.py -f <path_to_your_wav>

Datasets

There are multiple datasets involved, including KSC, TSC, USC, and Common Voice version 10.0 for the following languages: Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Turkish, Tatar, Uzbek, and Uyghur. To train the ASR model, please download all of them and specify the paths in conf/lang.conf.

Training

Our code builds upon ESPnet, and requires prior installation of the framework for DNN training. Please follow the installation guide and put the TurkicASR folder inside espnet/egs2/ directory. Run the traning scripts with ./run.sh

Citation

@Article{info14020074,
AUTHOR = {Mussakhojayeva, Saida and Dauletbek, Kaisar and Yeshpanov, Rustem and Varol, Huseyin Atakan},
TITLE = {Multilingual Speech Recognition for Turkic Languages},
JOURNAL = {Information},
VOLUME = {14},
YEAR = {2023},
NUMBER = {2},
ARTICLE-NUMBER = {74},
URL = {https://www.mdpi.com/2078-2489/14/2/74},
ISSN = {2078-2489}
}