open-speech-corpora icon indicating copy to clipboard operation
open-speech-corpora copied to clipboard

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Results 102 open-speech-corpora issues
Sort by recently updated
recently updated
newest added

https://podcastfillers.github.io/

https://github.com/facebookresearch/facestar

The [XTREME-S](https://huggingface.co/datasets/google/xtreme_s) dataset includes dozen of languages with a lot of hours.

https://www.linguistics.ucsb.edu/research/santa-barbara-corpus#Contents

https://www.kaggle.com/kaiida/kokoro-speech-dataset-v11-small/version/1

https://huggingface.co/datasets/z-uo/male-LJSpeech-italian

https://github.com/Toloka/CrowdSpeech

ShEMO: a large-scale validated database for Persian speech emotion detection Abstract This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000...

KsponSpeech is a large-scale spontaneous speech corpus of Korean conversations. This corpus contains 969 hrs of general open-domain dialog utterances, spoken by about 2,000 native Korean speakers in a clean...