emotion_detection_cpc icon indicating copy to clipboard operation
emotion_detection_cpc copied to clipboard

look for help

Open wanghl21 opened this issue 3 years ago • 7 comments

hi!I want to know more about your project, but I can't access your blog link https://bit.ly/2GpcT5P. Could you please upload it again? Thanks! :)

wanghl21 avatar Feb 16 '22 15:02 wanghl21

Hi, here is the medium link: https://medium.com/speechmatics/boosting-emotion-recognition-performance-in-speech-using-cpc-ce6b23a05759 Let me know if you have any questions.

jplhughes avatar Feb 16 '22 17:02 jplhughes

Thanks for your reply!! I have a question about the a 100-hour subset of the Librispeech dataset. How did you process this dataset into train and val dbl files. Thank you very much for your help!

wanghl21 avatar Feb 17 '22 08:02 wanghl21

I don't have a script in this repo to prepare the dbl for librispeech but it should be straight forward. Download the training files from here and then just get each filepath in a dbl file on a new line. Using find on the command line will do the job in creating the dbl file. find $unzipped_librispeech_dir -name "*.wav" > $dbl_path You can see how the dbls are loaded here.

jplhughes avatar Feb 17 '22 13:02 jplhughes

I don't have a script in this repo to prepare the dbl for librispeech but it should be straight forward. Download the training files from here and then just get each filepath in a dbl file on a new line. Using find on the command line will do the job in creating the dbl file. find $unzipped_librispeech_dir -name "*.wav" > $dbl_path You can see how the dbls are loaded here. Hi! I'm still not quite sure about this. Could you be a little more specific about how to handle the librispeech dataset? Should I need to rename the librispeech dateset to the RAVDESS format for getting the dbl files? I will be grateful for your early reply!

skewed-c avatar Sep 13 '22 02:09 skewed-c

Hi @skewed-c, if you download librispeech you should end up with a bunch of audio files in a directory on your machine. You then just need to get all the paths to each audio file in a ".dbl" file. A dbl file looks like this:

/Path/to/file1.wav /Path/to/file2.wav ... /Path/to/fileN.wav

You can create this by using 'find' on the command line.

This stage has nothing to do with the emotion ID dataset as there are no classes involved.

Hope that helps and let me know if you have more questions.

John

jplhughes avatar Sep 20 '22 21:09 jplhughes

Thanks for your timely reply, John! I will have a try as your advice. BTW, the original LibriSpeech dataset is the '.flac' format, should I transform it into the '.wav' format before I train the CPC model? Or I just use the original LibriSpeech dataset(in the '.flac' format ) to train the model?

skewed-c avatar Sep 21 '22 12:09 skewed-c

You can give .flac a try, just make sure torchaudio.load will deal with that file type (since that is what loads the audio in the dataloader)

jplhughes avatar Sep 21 '22 17:09 jplhughes