indonesian-tts
indonesian-tts copied to clipboard
Indonesian TTS (text-to-speech) using Coqui TTS
Indonesian TTS using Coqui TTS
Models are available in Releases tab.
DO NOT USE FOR COMMERCIAL PURPOSES!
Model changelog
v1.2 (Aug 12, 2022)
Finetuned from v1.1 model on:
- 4 hours of Audiobook dataset
- 2000 sample of Azure TTS
- High quality TTS data for Javanese & Sundanese
v1.1 (Aug 6, 2022)
Finetuned from LJSpeech model on:
- 4 hours of Audiobook dataset
- 2000 sample of Azure TTS
v1.0 (Jun 23, 2022)
Trained from scratch on:
- 4 hours of Audiobook dataset.
Example
Ardi (Azure):
https://user-images.githubusercontent.com/72781956/183240414-b1127e83-6ddd-427c-b58d-386c377f15b4.mp4
Gadis (Azure):
https://user-images.githubusercontent.com/72781956/183240420-a5d0d335-af4a-4563-a744-40f6795955c5.mp4
Wibowo (Audiobook):
https://user-images.githubusercontent.com/72781956/184360026-c81ac336-c9f1-48ee-97fb-907d66b7f343.mp4
How to use
You need g2p-id to convert grapheme to phoneme.
Use tts command from Coqui TTS to synthesize speech:
tts --text "saja səˈdanʔ ˈbərada di dʒaˈkarta." \
--model_path checkpoint.pth \
--config_path config.json \
--speaker_idx wibowo \
--out_path output.wav
You can get all speaker idx by using --list_speaker_idxs:
tts --model_path checkpoint.pth \
--config_path config.json \
--list_speaker_idxs
Data
Citations
@misc{https://doi.org/10.48550/arxiv.2106.06103,
doi = {10.48550/ARXIV.2106.06103},
url = {https://arxiv.org/abs/2106.06103},
author = {Kim, Jaehyeon and Kong, Jungil and Son, Juhee},
keywords = {Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
title = {Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech},
publisher = {arXiv},
year = {2021},
copyright = {arXiv.org perpetual, non-exclusive license}
}
@inproceedings{kjartansson-etal-tts-sltu2018,
title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}},
author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin},
booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
year = {2018},
address = {Gurugram, India},
month = aug,
pages = {66--70},
URL = {http://dx.doi.org/10.21437/SLTU.2018-14}
}