emotion_detection_cpc
emotion_detection_cpc copied to clipboard
look for help
hi!I want to know more about your project, but I can't access your blog link https://bit.ly/2GpcT5P. Could you please upload it again? Thanks! :)
Hi, here is the medium link: https://medium.com/speechmatics/boosting-emotion-recognition-performance-in-speech-using-cpc-ce6b23a05759 Let me know if you have any questions.
Thanks for your reply!! I have a question about the a 100-hour subset of the Librispeech dataset. How did you process this dataset into train and val dbl files. Thank you very much for your help!
I don't have a script in this repo to prepare the dbl for librispeech but it should be straight forward. Download the training files from here and then just get each filepath in a dbl file on a new line. Using find on the command line will do the job in creating the dbl file.
find $unzipped_librispeech_dir -name "*.wav" > $dbl_path
You can see how the dbls are loaded here.
I don't have a script in this repo to prepare the dbl for librispeech but it should be straight forward. Download the training files from here and then just get each filepath in a dbl file on a new line. Using find on the command line will do the job in creating the dbl file.
find $unzipped_librispeech_dir -name "*.wav" > $dbl_path
You can see how the dbls are loaded here. Hi! I'm still not quite sure about this. Could you be a little more specific about how to handle the librispeech dataset? Should I need to rename the librispeech dateset to the RAVDESS format for getting the dbl files? I will be grateful for your early reply!
Hi @skewed-c, if you download librispeech you should end up with a bunch of audio files in a directory on your machine. You then just need to get all the paths to each audio file in a ".dbl" file. A dbl file looks like this:
/Path/to/file1.wav /Path/to/file2.wav ... /Path/to/fileN.wav
You can create this by using 'find' on the command line.
This stage has nothing to do with the emotion ID dataset as there are no classes involved.
Hope that helps and let me know if you have more questions.
John
Thanks for your timely reply, John! I will have a try as your advice. BTW, the original LibriSpeech dataset is the '.flac' format, should I transform it into the '.wav' format before I train the CPC model? Or I just use the original LibriSpeech dataset(in the '.flac' format ) to train the model?
You can give .flac a try, just make sure torchaudio.load will deal with that file type (since that is what loads the audio in the dataloader)