open-speech-corpora icon indicating copy to clipboard operation
open-speech-corpora copied to clipboard

Datasets from jace-assistant

Open JRMeyer opened this issue 3 years ago • 0 comments

https://gitlab.com/Jaco-Assistant/Scribosermo/-/blob/master/preprocessing/README.md#datasets

German: Alcohol Language Corpus https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ALC/ALC.4.php ~48h BAS-Formtask https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/FORMTASK/FORMTASK.2.php ~18h BAS-Sprecherinnen https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SprecherInnen/SprecherInnen.1.php ~2h Brothers https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/BROTHERS/BROTHERS.2.php ~7h Common Voice https://voice.mozilla.org/ ~777h Common Voice Single Words https://voice.mozilla.org/ ~9h included in the main dataset CSS10 https://www.kaggle.com/bryanpark/german-single-speaker-speech-dataset ~16h GoogleWavenet ~165h artificial training data generated with the google text to speech service Gothic ~39h extracted from Gothic 1-3 games Guild2-Renaissance https://www.gog.com/game/the_guild_2_renaissance ~11h Hempel https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/HEMPEL/HEMPEL.4.php ~25h Kurzgesagt https://www.youtube.com/c/KurzgesagtDE/videos ~9h LinguaLibre https://lingualibre.org/wiki/LinguaLibre:Main_Page ~4h M-AILABS Speech Dataset https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ ~234h Multilingual LibriSpeech http://www.openslr.org/94/ ~1995h Multilingual TEDx http://www.openslr.org/100/ ~14h MussteWissen Deutsch https://www.youtube.com/c/musstewissenDeutsch/videos Mathe https://www.youtube.com/c/musstewissenMathe/videos Physik https://www.youtube.com/c/musstewissenPhysik/videos Chemie https://www.youtube.com/c/musstewissenChemie/videos ~11h PhattSessionz https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/PHATTSESSIONZ/PHATTSESSIONZ.2.php ~238h PhoneDat 1 https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/PD1/PD1.3.php ~21h PULS-Reportage https://www.youtube.com/puls/videos ~16h Regional Variants of German https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/RVG1_CLARIN/RVG1_CLARIN.3.php ~129h RVG - Juveniles https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/RVG-J/RVG-J.2.php ~49h SC10 https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SC10/SC10.4.php ~6h Smartweb Handheld Corpus https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SHC/SHC.2.php ~29h SI100 https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SI100/SI100.2.php ~36h Skyrim Legacy+DLCs https://store.steampowered.com/app/72850/The_Elder_Scrolls_V_Skyrim/ ~89h Smartweb Motorbike Corpus https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SMC/SMC.2.php ~6h Spoken Wikipedia Corpora https://nats.gitlab.io/swc/ ~248h Tatoeba https://tatoeba.org/deu/sentences/search?query=&from=deu&to=und&user=&orphans=no&unapproved=no&has_audio=yes&tags=&list=&native=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort_reverse=&sort=relevance ~8h Thorsten http://www.openslr.org/95/ ~23h TerraX https://www.youtube.com/c/terra-x/videos ~48h TUDA https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/acoustic-models.html ~185h Verbmobil 1 https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/VM1/VM1.3.php ~34h Verbmobil 2 https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/VM2/VM2.3.php ~22h Voxforge http://www.voxforge.org/home/forums/other-languages/german/open-speech-data-corpus-for-german ~33h WaSeP https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/WaSeP/WaSeP.2.php ~3h Witcher3-GOTY https://www.gog.com/game/the_witcher_3_wild_hunt_game_of_the_year_edition ~44h Y-Kollektiv https://www.youtube.com/c/ykollektiv/videos ~58h Zamia-Speech https://goofy.zamia.org/zamia-speech/corpora/zamia_de/ ~19h ZipTel https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZIPTEL/ZIPTEL.3.php ~13h

JRMeyer avatar Jun 22 '21 11:06 JRMeyer