NPTEL2020-Indian-English-Speech-Dataset icon indicating copy to clipboard operation
NPTEL2020-Indian-English-Speech-Dataset copied to clipboard

Need ground truth for train, valid and test dataset transcript

Open kafan1986 opened this issue 2 years ago • 2 comments

I have downloaded the files using the download script. The problem is apart from pure dataset I can not find transcript ground truth for the audio files. Only audio files are present inside the zipped directory. Am I missing some instructions or steps?

kafan1986 avatar Aug 31 '22 11:08 kafan1986

Hi @kafan1986

Thanks for your interest in this dataset. If I remember correctly, the dataset folder would constitute 3 primary folders - wav, txt and metadata. wav folder will contain the audio clips and the txt folder would contain all the transcripts.

I think your downloaded data might be partially downloaded. Please check if every zip file is correctly downloaded and then extract them. Please check and let us know if you face issues then.

Prem-kumar27 avatar Sep 01 '22 09:09 Prem-kumar27

Hi @Prem-kumar27 Please can you check if currently you can download train data with the texts. It seems that train part does not have texts in it.

aasmangulyan avatar Jan 09 '24 07:01 aasmangulyan