CMU-MultimodalSDK icon indicating copy to clipboard operation
CMU-MultimodalSDK copied to clipboard

~3k mosei videos missing between raw data and csd videos id

Open Dayire opened this issue 2 years ago • 0 comments

Hello!

I'm basically extracting my own features set from the raw data on CMU-MOSEI and I extract the sentiment/emotions labels from the SDK (using this tutorial ) Then I get the set of video_utterence_ids from the SDK and video_utterence_ids from the features extracted on raw data.

Problem is, there is a mismatch: I was only able to match the two sets of ids up to train: 13005, val:1641, test:3752 with a total of 18398 samples. (this problem only happens for audio/video modality, text is just fine). I did found some corrupt segmented videos but I rerun ffmpeg according to the timestamps, that helped (it got me the 18k) but I am still way far from the 21k samples.

Can you advice?

Thanks! Rachid.

Dayire avatar Jul 06 '21 09:07 Dayire