klaam
klaam copied to clipboard
Arabic speech recognition, classification and text-to-speech.
klaam
Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.
Usage
Speech Classification
from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)
Speech Recongnition
from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)
Text To Speech
from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"
model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path)
model.synthesize(sample_text)
There are two avilable models for recognition trageting MSA and egyptian dialect . You can set any of them using the lang
attribute
from klaam import SpeechRecognition
model = SpeechRecognition(lang = 'msa')
model.transcribe('file.wav')
Datasets
Dataset | Description | link |
---|---|---|
MGB-3 | Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube. | requires registeration here |
ADI-5 | More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge. | requires registeration here |
Common voice | Multlilingual dataset avilable on huggingface | here. |
Arabic Speech Corpus | Arabic dataset with alignment and transcriptions | here. |
Models
We currently support four models, three of them are avilable on transformers.
Language | Description | Source |
---|---|---|
Egyptian | Speech recognition | wav2vec2-large-xlsr-53-arabic-egyptian |
Standard Arabic | Speech recognition | wav2vec2-large-xlsr-53-arabic |
EGY, NOR, LAV, GLF, MSA | Speech classification | wav2vec2-large-xlsr-dialect-classification |
Standard Arabic | Text-to-Speech | fastspeech2 |
Example Notebooks
Name | Description | Notebook |
---|---|---|
Demo | Classification, Recongition and Text-to-speech in a few lines of code. |
|
Demo with mic | Audio Recongition and classification with recording. |
|