GigaSpeech
GigaSpeech copied to clipboard
GigaSpeech on HuggingFace
GigaSpeech dataset is now available on HuggingFace Hub.
Highlights of GigaSpeech on HuggingFace
- easy to use (a two-liner in python)
- Smoother and faster downloading from US & EU, even support on-the-fly downloading during training
- preprocessed:
- decompressed
- short audio files(.wav) are segmented and extracted from raw long audio
- supervisions are extracted from raw metadata.json
- subsets can be downloaded separately (e.g. XS/S/M/L/XL for training, DEV/TEST for benchmarking)
- users can even listen to audio samples via HuggingFace's dataset viewer
How-to
- Step 1: Fill GigaSpeech application form from SpeechColab
- Step 2: Go to GigaSpeech on HuggingFace webpage and follow instructions there.
Useful links
Credits
Many thanks to The Dataset Team & Speech Team at HuggingFace, particularly @polinaeterna , @patrickvonplaten , @sanchit-gandhi , GigaSpeech just becomes more accessible to the entire speech community!