GigaSpeech icon indicating copy to clipboard operation
GigaSpeech copied to clipboard

GigaSpeech on HuggingFace

Open dophist opened this issue 2 years ago • 2 comments

GigaSpeech dataset is now available on HuggingFace Hub.

Highlights of GigaSpeech on HuggingFace

  • easy to use (a two-liner in python)
  • Smoother and faster downloading from US & EU, even support on-the-fly downloading during training
  • preprocessed:
    • decompressed
    • short audio files(.wav) are segmented and extracted from raw long audio
    • supervisions are extracted from raw metadata.json
  • subsets can be downloaded separately (e.g. XS/S/M/L/XL for training, DEV/TEST for benchmarking)
  • users can even listen to audio samples via HuggingFace's dataset viewer

How-to

Useful links

Credits

Many thanks to The Dataset Team & Speech Team at HuggingFace, particularly @polinaeterna , @patrickvonplaten , @sanchit-gandhi , GigaSpeech just becomes more accessible to the entire speech community!

dophist avatar Jun 27 '22 14:06 dophist