datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Add CHiME4 dataset

Open patrickvonplaten opened this issue 4 years ago • 4 comments

Adding a Dataset

  • Name: Chime4
  • Description: Chime4 is a dataset for automatic speech recognition. It is especially useful for evaluating models in a noisy environment and for multi-channel ASR
  • Paper: Dataset comes from a channel: http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME4/ . Results paper:
  • Data: http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME4/download.html
  • Motivation: So far there are very little datasets for speech in datasets. Only lbirispeech_asr so far.

If interested in tackling this issue, feel free to tag @patrickvonplaten

Instructions to add a new dataset can be found here.

patrickvonplaten avatar Feb 08 '21 12:02 patrickvonplaten

@patrickvonplaten not sure whether it is still needed, but willing to tackle this issue

KossaiSbai avatar Dec 26 '23 18:12 KossaiSbai

Hey @patrickvonplaten, I have managed to download the zip on here and successfully uploaded all the files on a hugging face dataset:

https://huggingface.co/datasets/ksbai123/Chime4

However I am getting this error when trying to use the dataset viewer:

Screenshot 2023-12-27 at 18 40 59

Can you take a look and let me know if I have missed any files please

KossaiSbai avatar Dec 27 '23 17:12 KossaiSbai

@patrickvonplaten ?

KossaiSbai avatar Jan 31 '24 17:01 KossaiSbai

Hi @KossaiSbai,

Thanks for your contribution.

As the issue is not strictly related to the datasets library, but to the specific implementation of the CHiME4 dataset, I have opened an issue in the Discussion tab of the dataset: https://huggingface.co/datasets/ksbai123/Chime4/discussions/2 Let's continue the discussion there!

albertvillanova avatar Feb 01 '24 10:02 albertvillanova