GigaSpeech issues

About gigaspeech glm file

2

Hi sir, does gigaspeech provide a glm file like swbd en20000405_hub5.glm containing the transcript filtering rules? I notice there are some rules in gigaspeech_scoring.py file. But do you have the...

CuiMingyu

documentation

Fix Athena data prep bug

Fixing the bug mentioned in https://github.com/SpeechColab/GigaSpeech/issues/103

dophist

How can I continue to download from the disconnection point?

2

Hello, the download failed due to the disconnection of the network connection in the process of downloading audio data. How can I continue to download from the disconnection point?

guo453585719

documentation

GigaSpeech on HuggingFace

2

GigaSpeech dataset is now available on HuggingFace Hub. --- ### Highlights of GigaSpeech on HuggingFace * easy to use (a two-liner in python) * Smoother and faster downloading from US...

dophist

documentation

Is XL subset the 33000hr unlabeled data?

1

Hi, As mentioned in the README, GigaSpeech contains "33,000+ hours for unsupervised/semi-supervised learning". I am trying to use these unlabeled data, and I have already downloaded the XL subset. But...

mct10

documentation

Missmatch Sample rate Opus files

1

Hello, I saw that `sample_rate=16000` in `GigaSpeech.Json` does not match with the one in opus file `SR=48000`: ``` ffmpeg -i /workspace/datasets/GigaSpeech_corpus/audio/podcast/P0001/POD0000000001.opus ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers...

aheba

documentation

Metadata conversion (json -> jsonl)

See the discussion here: https://github.com/SpeechColab/PySpeechColab/pull/2 We should make changes to https://github.com/SpeechColab/GigaSpeech/blob/main/utils/download_meta.sh to include the json->jsonl conversion. The conversion command is `jq -c '.audios[]' GigaSpeech.json > GigaSpeech.jsonl` See examples here for...

chenguoguo

documentation

enhancement

Number of Words in Test Set

4

Hi, is there an official number for the final number of words in the test set for scoring? WeNet results say there are 19928 sentences and 390656 words: https://github.com/wenet-e2e/wenet/tree/main/examples/gigaspeech/s0 Kaldi...

xiaoxiao26-zz