CMU-MultimodalSDK icon indicating copy to clipboard operation
CMU-MultimodalSDK copied to clipboard

Extract CMU-MOSEI by self-trained feature extractor without Segmented Video

Open siatwangmin opened this issue 4 years ago • 11 comments

Extract CMU-MOSEI by self-trained feature extractor without Segmented Video

Hi Zadeh, thx for this great work. I want to extract features from videos using my self-trained network. So I need to the process the data from scatch. I download the Raw data from http://immortal.multicomp.cs.cmu.edu/raw_datasets/CMU_MOSEI.zip, and I use the train/valid/test split from your SDK

mmdatasdk.cmu_mosei.standard_folds.standard_train_fold
mmdatasdk.cmu_mosei.standard_folds.standard_valid_fold
mmdatasdk.cmu_mosei.standard_folds.standard_test_fold

but I meet a problem that , lots of files in standard_train_fold can be found in "CMU_MOSEI/Raw/Videos/Full/Combined" Folder but CANNOT be found in "CMU_MOSEI/Raw/Videos/Segmented/Combined" Folder. for example "hh04W3xXa5s" is in "standard_train_fold" and can be found in "CMU_MOSEI/Raw/Videos/Full/Combined" Folder but CANNOT be found in "CMU_MOSEI/Raw/Videos/Segmented/Combined".

From your annotation instruction, there should be 3 files "hh04W3xXa5s_0, hh04W3xXa5s_1, hh04W3xXa5s_2" in "CMU_MOSEI/Raw/Videos/Segmented/Combined"

I give a brief summary below:

353 train files not in Segmented Videos, like:
['hh04W3xXa5s', '72tXTrSXoMk', 'YVHJpAROBvQ', '-iRBcNs9oI8', 'gZF-YNQHqwI', 'Z3fcd1wdzr0', '6TKaGMkO69E', 'ogGweZUAVtU', '254298', 'RadU51t1kL0', ...]

35 valid files not in Segmented Videos, like:
['icbUzboLcDQ', 'RArhIHk4Qs4', 'lO87-4Kf0cQ', 'SuIcJvERiFk', '102858', 'XIGpuL-Kkyk', '0y022OlZ3W0', 'IRY4D_-mx3Q', 'PboaYD5hlG8', 'c1a1y9ytHH0', ...]

105 test files not in Segmented videos
['7l3BNtSE0xc', 'dZFV0lyedX4', 'DnBHq5I52LM', 'x266rUJQC_8', 'MYEyQUpMe3k', 'eFV7iFPYZB4', 'BRSyH6yfDLk', 'W1CWpktWtTs', 'SKTyBOhDX6U', 'tC2KicUHB9Q', ...]

And Then I want to clip the segment videos from the "CMU_MOSEI/Raw/Videos/Full/Combined" with the "intervals" from "ALL_Label.csd". unfortunately the "intervals" are relative time span from some start time point, and I can't find the start time point of each videos from the Dataset.

So Can you provide:

  1. all the segmented videos used in train/valid/test ?
  2. the start time point "intervals"?

Thx

siatwangmin avatar Aug 25 '20 14:08 siatwangmin

Having the same issue here :(

yAya-yns avatar Aug 26 '20 01:08 yAya-yns

So how to process the raw data according to SDK? Are those segmented videos dropped?

Redaimao avatar Aug 26 '20 10:08 Redaimao

Hi @siatwangmin, @yAya-yns, @Redaimao

The segmented videos are actually created from the full videos based on the information of sentences in the transcript. So each sentence has a beginning and end and then we segment the video to contain that segment. Example: sentence 0 starts from 1.2 seconds all the way to 10.9 seconds, then the video will be cropped at that interval. It seems the segmented folder is a bit outdated and does not cover all the newest videos. It should be relatively easy to make the cuts using ffmpeg if you want to quickly get it fixed. Hope this helps!

A2Zadeh avatar Sep 02 '20 04:09 A2Zadeh

Hi @siatwangmin, @yAya-yns, @Redaimao

The segmented videos are actually created from the full videos based on the information of sentences in the transcript. So each sentence has a beginning and end and then we segment the video to contain that segment. Example: sentence 0 starts from 1.2 seconds all the way to 10.9 seconds, then the video will be cropped at that interval. It seems the segmented folder is a bit outdated and does not cover all the newest videos. It should be relatively easy to make the cuts using ffmpeg if you want to quickly get it fixed. Hope this helps!

unfortunately, the "intervals" are RELATIVE time span from some start time point of the video, and I can't find the start time point of each video from the Dataset. Can you tell me how to find the ABSOLUTE start point?

siatwangmin avatar Sep 02 '20 05:09 siatwangmin

@siatwangmin Actually the intervals are usually absoluate from the beginning of the full video. Would you be able to tell me where you found them to be relative?

A2Zadeh avatar Sep 02 '20 06:09 A2Zadeh

@siatwangmin Actually the intervals are usually absoluate from the beginning of the full video. Would you be able to tell me where you found them to be relative?

Let's take hh04W3xXa5s for example. hh04W3xXa5s is in standard_train_fold but CANNOT BE FOUND in ./Raw/Videos/Segmented/Combined/, I found the full video in ./Raw/Videos/Full/Combined/hh04W3xXa5s.mp4 and Segmented transcript in ./Raw/Transcript/Segmented/Combined/hh04W3xXa5s.txt, the segmented transcript content is listed as below:

hh04W3xXa5s___0___-0.487528344671___3.32607709751___This is another of my favorite albums
hh04W3xXa5s___1___2.49569160998___6.11972789116___Now you might wonder, why are you buying your favorite albums again
hh04W3xXa5s___2___5.29931972789___17.7931972789___I have multiple copies of albums like this that I always misplace or that I burned onto one system and then lost the CD

I suppose the script can be parsed as

*File Name*___ID__**START TIME***___**EDN TIME***___*******CONTENT***********************
hh04W3xXa5s___0___-0.487528344671___3.32607709751___This is another of my favorite albums
hh04W3xXa5s___1___2.49569160998___6.11972789116___Now you might wonder, why are you buying your favorite albums again
hh04W3xXa5s___2___5.29931972789___17.7931972789___I have multiple copies of albums like this that I always misplace or that I burned onto one system and then lost the CD

So the hh04W3xXa5s[0] is start from -0.487528344671 to 3.32607709751 of ./Raw/Videos/Full/Combined/hh04W3xXa5s.mp4 , but actually we found that the sentence start in 03:08 and end at 03:11 in ./Raw/Videos/Full/Combined/hh04W3xXa5s.mp4. the interval time from -0.487528344671 to 3.32607709751 is just a RELATIVE time interval with the OFFSET start at 03:08. I manually labeled an example of the ABSOLUTE time interval of hh04W3xXa5s is listed as below:

hh04W3xXa5s___0___03:08___03:11___This is another of my favorite albums
hh04W3xXa5s___1___03:11___03:14___Now you might wonder, why are you buying your favorite albums again
hh04W3xXa5s___2___03:14___03:26___I have multiple copies of albums like this that I always misplace or that I burned onto one system and then lost the CD

So can you provide ablute start time point of the segmented transcript which we can use it to segment the full video? Thx!

siatwangmin avatar Sep 02 '20 07:09 siatwangmin

Hi @siatwangmin. This one seems to be an isolated instance, whereas dominant majority of the full videos should start from 0. Let me know if you see other videos having the same problem.

A2Zadeh avatar Sep 04 '20 19:09 A2Zadeh

Hi @A2Zadeh A2Zadeh, Thx for your reply. I do some statics by myself. here are some discoveries:

  1. Yes majority of the full videos start from 0, but hh04W3xXa5s is the only mislabeled sample.
  2. files that can be found in segmented videos are well labeled.
  3. lots of files that can not be found in segmented videos are mislabeled, especally, some long videos.

So My Questions is how to deal with this mislabeled videos?

1. Here is how I do the statics:

Step 1: Check file which CAN be found in segmented videos:

I sampled 10 files in standard_train_fold , 10 files in standard_test_fold and 10 files in standard_valid_fold, which can be found in "CMU_MOSEI/Raw/Videos/Full/Combined" Folder and also CAN be found in "CMU_MOSEI/Raw/Videos/Segmented/Combined" Folder.

And check their Segmented Transcript, their time intervals are all good labeled

Step 2: Check file which CAN NOT be found in segmented videos:

I sampled 20 files in standard_train_fold , 20 files in standard_test_fold and 20 files in standard_valid_fold, which can be found in "CMU_MOSEI/Raw/Videos/Full/Combined" Folder and also CAN NOT be found in "CMU_MOSEI/Raw/Videos/Segmented/Combined" Folder.

And check their Segmented Transcript here is the statics

  # Sample # mislabeled mislabeled file names # good labeled good file names
standard_train_fold 20 13 6TKaGMkO69E/ 72tXTrSXoMk/ gZF-YNQHqwI/ hh04W3xXa5s/ -iRBcNs9oI8/ RadU51t1kL0/ YVHJpAROBvQ/ Z3fcd1wdzr0/ 0vXaXWx7Rvo/ dqragS38hCk/ iREkcXde5ds/ Sir2QeCh4B0/ wd-LTpCtAzw/ 7 254298/ ogGweZUAVtU/ k1ca_xbhohk/ kf3fZcx8nIo/ PohW-isYMK0/ UK_IXtJ2BqI/ xsiHAO0gq74/
standard_valid_fold 20 10 0y022OlZ3W0/ icbUzboLcDQ/ XIGpuL-Kkyk/ Baz0VVQ-E9E/ H1DpVwktfDQ/ Pt-D0LUHDSc/ Q4WzApjaCNI/ Q7S0tX4FUNA/ SHa756SwGJQ/ x2lOwQaAn4g/ 10 102858/ c1a1y9ytHH0/ IRY4D_-mx3Q/ lO87-4Kf0cQ/ PboaYD5hlG8/ RArhIHk4Qs4/ SuIcJvERiFk/ 243646/ 5O1W39o56gg/ eOiC1kb17P4/
standard_test_fold 20 14 7l3BNtSE0xc/ BRSyH6yfDLk/ DnBHq5I52LM/ dZFV0lyedX4/ eFV7iFPYZB4/ MYEyQUpMe3k/ SKTyBOhDX6U/ tC2KicUHB9Q/ W1CWpktWtTs/ eJfT7-dDqzA/ GXIfrEUJ5d4/ lwL4hjhkid4/ QnYlpSeVOYo/ ZKErPftd--w/ 6 8i7u3fl-hP8/ AgH84SNRx5s/ JATMzuV6sUE/ P17tYiqMGRU/ UweZVaFqruU/ PexNiFbPTYM

Most files' time intervals are not well labeled, they are just RELATIVE TIME INTERVAL

and most of the mislabeled time interval videos tend to be long videos.

In Appendix A I give statics of files in segmented videos and Not in Segmented Videos

In Appendix B I list some mislabeled files in tests_fold without Segmented Videos and mislabel samples.

And I also list all the video names without segmented videos in Appendix C

2. Appendix

Appendix A statics of files in segmented videos and Not in Segmented Videos

  Total Segmented Segmented Ratio No Segmented No Segmented Ratio
standard_train_fold 2249 1902 0.845709204 347 0.154290796
standard_valid_fold 300 265 0.883333333 35 0.116666667
standard_test_fold 678 575 0.848082596 103 0.151917404
  3227 2742 0.849705609 485 0.150294391

Appendix B Some files in tests_fold without Segmented Videos and mislabel samples,

the first row is originally labeled and the second row is labeled by myself.

7l3BNtSE0xc:

7l3BNtSE0xc___0___0.0124716553288___1.31950113379___Welcome back.
7l3BNtSE0xc___0___00:00_____________00:01___________Welcome back.

7l3BNtSE0xc___1___1.31950113379___3.17528344671___Scott duPont, Film Producing film and creating a buzz - this is where you're totally through with the post production process.
7l3BNtSE0xc___1___00:01___________00:13___________Scott duPont, Film Producing film and creating a buzz - this is where you're totally through with the post production process.

7l3BNtSE0xc___5___3.17528344671___4.05328798186___We start this very early on.
7l3BNtSE0xc___5___00:49___________00:50___________We start this very early on.

7l3BNtSE0xc___14___4.05328798186___4.94126984127___Stay tuned on Expert Village.
7l3BNtSE0xc___14___01:42___________01:43___________Stay tuned on Expert Village.

7l3BNtSE0xc___15___5.49002267574___6.86689342404___We've got a few more segments on Film Producing 102. 
7l3BNtSE0xc___15___01:43___________01:47___________We've got a few more segments on Film Producing 102. 

BRSyH6yfDLk:

BRSyH6yfDLk___0___-0.487528344671___12.4253968254___"curry on me" and then it happened, in 2004 I landed in New York, and the first thing I realized was reality is the bitch
BRSyH6yfDLk___0___03:42_____________03:55___________"curry on me" and then it happened, in 2004 I landed in New York, and the first thing I realized was reality is the bitch

BRSyH6yfDLk___1___11.8045351474___27.2517006803___The first person to greet me was Indian The immigration officer was Indian customs officer eating his lunch was Indian
BRSyH6yfDLk___1___03:55___________04:09___________The first person to greet me was Indian The immigration officer was Indian customs officer eating his lunch was Indian

BRSyH6yfDLk___2___26.3814058957___31.7714285714___He made me throw all the food my mom had packed for my survival
BRSyH6yfDLk___2___04:09___________04:14___________He made me throw all the food my mom had packed for my survival

BRSyH6yfDLk___3___30.9210884354___46.8172335601___Fuckin hypocrite i had not seen as many Indians in New Delhi as I saw in New York Subway, Dunkin Donuts, Hudson news, Hudson Deli
BRSyH6yfDLk___3___04:14___________04:28___________Fuckin hypocrite i had not seen as many Indians in New Delhi as I saw in New York Subway, Dunkin Donuts, Hudson news, Hudson Deli

BRSyH6yfDLk___4___45.8172335601___48.8925170068___everywhere You know what you guys should try
BRSyH6yfDLk___4___04:28___________04:30___________everywhere You know what you guys should try

DnBHq5I52LM

DnBHq5I52LM___0___-0.487528344671___2.67755102041___We need tax cuts and tax reform now
DnBHq5I52LM___0___02:59_____________03:02___________We need tax cuts and tax reform now

DnBHq5I52LM___2___3.41360544218___8.05532879819___it's a little bit weird how he moves his mouth
DnBHq5I52LM___2___03:02___________03:06___________it's a little bit weird how he moves his mouth

DnBHq5I52LM___3___7.05532879819___14.989569161___how little he moves his mouth when he talks, like his teeth don't separate
DnBHq5I52LM___3___03:06___________03:13_________how little he moves his mouth when he talks, like his teeth don't separate

DnBHq5I52LM___4___13.989569161___17.7034013605___It's almost like he's the ventriloquist and the dummy at the same time
DnBHq5I52LM___4___03:13___________03:18_________It's almost like he's the ventriloquist and the dummy at the same time

DnBHq5I52LM___5___16.7034013605___20.8063492063___Like, what-what is going on here
DnBHq5I52LM___5___03:18___________03:21_________Like, what-what is going on here

DnBHq5I52LM___6___19.8063492063___24.20861678___That's how I speak to people
DnBHq5I52LM___6___03:21___________03:24_________That's how I speak to people

DnBHq5I52LM___7___25.3637188209___30.7437641723___Uh, now, you might be thinking, "Trevor, I recognize the name Mnuchin, but not from tax news
DnBHq5I52LM___7___03:24___________03:30_________Uh, now, you might be thinking, "Trevor, I recognize the name Mnuchin, but not from tax news

DnBHq5I52LM___8___29.9632653061___36.0616780045___" Well, maybe it's because last month he tried to get you to pay for his honeymoon
DnBHq5I52LM___8___03:30___________03:35_________" Well, maybe it's because last month he tried to get you to pay for his honeymoon

DnBHq5I52LM___9___35.0616780045___48.5632653061___REPORTER: <i>Newly married this past summer,</i> <i> multimillionaire Treasury Secretary Steven Mnuchin</i> <i> formally requested that he and his new wife Louise</i> <i> be allowed to travel in style in a government jet</i> <i> on their honeymoon to Europe
DnBHq5I52LM___9___03:35___________03:48___________REPORTER: <i>Newly married this past summer,</i> <i> multimillionaire Treasury Secretary Steven Mnuchin</i> <i> formally requested that he and his new wife Louise</i> <i> be allowed to travel in style in a government jet</i> <i> on their honeymoon to Europe

DnBHq5I52LM___10___47.5632653061___56.2557823129___</i> <i> At an estimated cost of $25,000 an hour,</i> <i> the price for taxpayers</i> <i>would have been several hundred thousand dollars
DnBHq5I52LM___10___03:48___________03:56___________</i> <i> At an estimated cost of $25,000 an hour,</i> <i> the price for taxpayers</i> <i>would have been several hundred thousand dollars

Appendix C : all the video names without segmented videos:

347 Names in train_stand_folder without segmented videos

['hh04W3xXa5s', '72tXTrSXoMk', 'YVHJpAROBvQ', '-iRBcNs9oI8', 'gZF-YNQHqwI', 'Z3fcd1wdzr0', '6TKaGMkO69E', 'ogGweZUAVtU', '254298', 'RadU51t1kL0', 'iREkcXde5ds', 'k1ca_xbhohk', 'dqragS38hCk', 'PohW-isYMK0', '0vXaXWx7Rvo', 'kf3fZcx8nIo', 'Sir2QeCh4B0', 'UK_IXtJ2BqI', 'xsiHAO0gq74', 'wd-LTpCtAzw', 'fhY2vbnjuWY', '210238', '72385', 'FWBCTZiijEM', 'bUFAN2TgPaU', 'Wo1RxRjXyYw', '08d4NTXkSxw', '0SfejBuLgFo', 'pbFwuNCQlH8', 'ytXVSpPfKwA', 'SH0wXyhsx9s', 'aNOuoSVlunM', 'w8TXP0iz29A', 'ai7G98thPpk', 'G8p4QMjLUXI', 'c7xUcM68IFE', 'jlCFLG6rKKY', 'JkHxzOWOLfs', '5VKMctXBN9M', 'Va54WZgPTdY', 'YcyHQQXGXWA', 'ipLoS44xfO4', '54CM1_GA-uw', 'eY64P27khzk', 'nHMHePX9WoU', 'fbHlBmq7Ipo', '238645', 'Gmhi58erY6k', 'h7p5URoookk', '7EWOMjaKlus', '3aIQUQgawaI', 'jT3FSTBA8Us', 'BZOB3r5AoKE', '7rMLN0KKE5k', 'So3bPzg2bq0', 'QoIsjc4-GIg', 'Tiq67-bAV3M', 'iceSGs0MXaA', 'A8plGi4rbxM', 'L382XZ6iZmM', 'UtuTyW9pUN8', '83859', 'X7-Gbk8gAD0', 'xomMHflvVDw', 'ahcAFnY6iAY', '3DOrdP4H8SA', 'vM3YB7LmMq4', 'Dws8ZrzF7xQ', '9zBj8VkRBpE', 'wj3ur4fsiN4', 'cnllFPRyBFs', 'EoC83JhkCAw', 'HByf7qiO-Kg', '424vOT3Nnyk', 'W-ptEZFARVo', 'hlYDicOj2m0', 'XPoY4LD-A1I', '9sAoeFTKILY', '220134', '1ESU5ONMMxs', '256935', 'JzydLJw6y6o', '69NtV79qgOg', 'q-0gu48ClF4', '4qVvLLFEYnk', 'fzuTEKwNS94', '9K5mYSaoBL4', 'k8NgVOCDYKM', 'SaNXCez-9iQ', 'SGsiGz2fdpo', 'QEG_hkJsaYc', 'JYdfUNjyYxo', 'mxa4KXSz9rw', 'YJJoYkFPmds', '_26JmJnPKfM', 'RST6PgpsLws', '0BVed2nBq1g', 'dbpGH5iP0GE', 'PbyI-sUzLZY', 'Cb8ay6WtJuM', 'Rugd2NMu4bA', 'kY64gXOapYk', '119348', 'cSrM5mHACmA', 'znNt--6itO4', 'RE-LIPOFnrE', 'RvIohowRPAk', '2-4autDbHVQ', '0WIwQgH4lKg', '7sgAWvbLtiM', 'B1PzCwfgXyU', '6ng8CZ0ULK4', '231025', 'HgDdU_RB9UA', '18QjfdhJEM4', 't4-ZulW5np4', 'L6pg3DQKoH4', 'YY2yjEEoB3U', '1F0qH0EEBfo', 'DaXkixKFEvE', '4VA4kqMnEqA', 'tdIZZ9v0IGA', 'DQ7QKE3hzVg', '4hEWns7JBg0', 'wxxqXK9x-64', 'wlrb0HyIs-Q', '8ovb-GaZ3QE', 'W-pA0lOGLR0', 'F5SGdvSRJmo', 'Q2uHXKA8cq8', 'mRnEJOLkhp8', 'sPxoGNvnVzg', 'Rse6laKf1i8', 'dsob2MgUPpA', '0FA32bEZ6xI', 'XyBU_gZtU0M', 'x9E8yaFCX0Y', 'ATfnMuJJDkk', '_iy7ftq27xE', 'REkPZ77s3Vc', 'khw3YCTFLfs', 'K_5u2Wh_wGk', 'cZVM4svwE90', '0rE8jEvQW_I', 'IW7IssCKgmI', 'sFLTjqVS7AE', '984VkHzXl8w', 'obnOnuzb-Xw', 'wI7DDCRh4Nw', 'VN1IPN1TL3Q', 'O8IJh_L0EfM', 'Cj7R36s4dbM', '4wuug3-5cWo', 'uYzZB4ccG1Q', 'kkaX3okuvjI', 'TJelZLbfT2A', 'krHZuqfXrPU', '1tQOvm5eQOA', '218708', 'HqtvFTa29L8', 'a8WV5KEMKSw', 'PUIsiLk8etk', 'gjEYmdWrBLM', 'mHIvH6Nnrls', '4wdeBJ39Cuw', 'oGnKsroR0R0', 'tXxXhGD1aMo', 'm8tzdtrgFUA', 'o2bNnLOEEC0', 'YLA7h1RTa9w', 'jbJF3aphcP0', 'OdKp0hYomgs', 'Zs8x712Y-CM', '94ULum9MYX0', 'veHYwR7ge6Y', 'nsLACXjD4KU', 'RLHZ6xtIWSM', 'cMUS4nhcKCQ', 'nM5S3xxNOlQ', 'ZQuGizjoCTY', 'xXIq7YPkdUQ', 'yt9GpittX4U', 'kddCewAsxfA', '4Q7OkrJLzAc', 'QLEOYF7Mju0', '4iw1jTY-X3A', '78Ncka6CY70', '9bAgEmihzLs', 'ce1ObTPlI38', '2HmBM3GGTlg', '0EAAfLCQabY', 'MBDZbACupsc', 'yrIXBklJ5YQ', 'kx544gnOZB0', 'C74-of1rjg4', 'lzVA--tIse0', 'z6E-ocntPo4', 'ADYia28RrFU', 'MroQfGehC84', 'c5AJbOd794U', 'EOkdFMw0pmk', 'D-DqVICJXBU', 'lKTeLK8nq8w', '7UlSX-syPeo', 'L5a2ijeWpUI', 'JW3OfSCZlhc', '25t8nrkUfRY', 'MJnc4GUhUuY', 'pAclBdj20ZU', 'efLrpnuLwyU', 'C8Fhlk-eczU', 'YVBsDJtAbk4', 'VcZffwwL0bw', 'l8LTcZJ-_8k', 'BBSVLGf7zPI', 'jfSGWBfVNBI', 'MFrwi-RibUk', 'hCFV5VLgS0A', 'wUf6AvCMHbQ', '1CjUHNYzW1E', 'TxRS6vJ9ak0', '_PKP676ez2c', 'JlmpDdm1A1Y', 'S6S2XL0-ZpM', 'dXMAS2tIknw', 'K9U2TAF9DDY', '5qQs9Cfydo4', 'GRFXnnrQaB4', 'FmIS69vB12I', 'd99mfGIenCo', 'n5J1ZkGUGnU', 'GMHW6XEO1ms', 'fTKe7E_4OCQ', 'YodtMIsjM0c', '89787', 'Itqbc2TBXWY', 'EfQoSpEZa0o', '5Tqu1IXJjGY', 'PMH4mdJeojE', '1A7dqFxx8wU', 'H26MMmrHTlQ', 'bJI7-LnKPH4', 'Wh6sht7xwqQ', '79934', 'VAr7gJydnys', 'cM1Zuji24dI', 'caLWajc18Y4', 'FdTczBnXtYU', 'WTF9xgqLIvI', 'MyfRrTjglm8', 'Ctc23Icfzvg', 'uxHBEfIhY-E', 'b_SffSbPVJM', 'u94VPxEIJPg', 'xvnFF0FiyS4', 'esnOE1LefKA', 'wEs7pJH2mqQ', 'UkqTwAuA8jI', 'TbZPa_keawc', 'SmftC-VAfYQ', 'tnhRxo-g3Xw', '27798', '74184', '4Q95Lg3Icho', '0EphpADwdPg', '244623', 'PybuzkS31J4', 'vwUBtNvjrU4', 'EGA6iulTr00', 'jnG9zog4NCs', 'aE-X_QdDaqQ', 'nw6Kf3AtCz4', 'rC7qKZMKa0U', 'HxyfY7hsjOg', 'A6R6J8SJtBc', '6zjv1TLqfF0', 'mgovo8VYtvk', 'cHrOisasWLU', 'IsAtAzltJNM', 'ZtuTCuh9C1M', '6G8JJ69aN6o', 'P0WaXnH37uI', 'wz3nVBPVgIA', 'TBCjKGYBNIo', 'yKdIZR5xfcc', 'a5CSKvCAhbs', 'mkkoJ2iVbGs', 'j9mRv472dq0', 'JGEEA_JVriE', 'MsjnLoTKAXo', 'oANUXY3xXKM', 'F59hwsm4Ld0', 'rB6WmWvvyxg', '-mJ2ud6oKI8', 'hlBOP5NskhM', 'pwj9YeMJC08', 'c6wuh0NRG1s', 'UfLMdSGGQVw', 'WygyVNL_qdE', 'zFTuwjr3xq0', 'dQ56b0bqmc8', 'QzdEjKQFisQ', '5RVF_3YBUVI', 'c9jD33baJ60', 'ZZzdvUdOTww', '9dFEGb_RfwE', '8xfW_azeNPk', '0DBfvvIVmqY', 'ixQbCXLUUj8', 'LpbSYaPRTqI', 'E0g6nae4Ae4', 'WUeSV0Z23Kg', 'aE7sckIAWuw', 'nD1R4UedEDo', 'nGah7qST1dI', 'TSDC7CCyeGY', '0OC3wf9x3zQ', '0YdYFtVdlWE', 'd0QNH2vcDgU', '4AQC7_uCuEE', 'zsRTbbKlsEg', 'kqXpf26EL-s', '39lcRCFvV-s', 'FMW6diQ9rMo', 'LHC7ZKJy4Zw', 'L1gXZVU6AE4', 'Gv0kKbfwPpI', '7ZzbemE4QEE', 'jFpRdhl3fgw', 'S9PDgK7fpJI', '1mHjMNZZvFo', '15ktF-6wutk', 'VWik5oKP6IU', 'OBaMn1-x7jQ', 'YRbtXb9fWmI', 'qJZlp9uxoTU', 'R5a3TP-l_n4', 'aCwLQrJz4Bo', 'Zj_4YmbMWtg', 'xk4C4p5vHDk', 'WZVfvgTeFPo', '4jTUcNlxlYA', '24xayGF5yYA', 'ePtlUYCdrNM', 'RRKez9BR-94', 'HuIKyKkEL0Q', 'CfWxp5bRt2A']

35 Names in valid_stand_folder without segmented videos

['icbUzboLcDQ', 'RArhIHk4Qs4', 'lO87-4Kf0cQ', 'SuIcJvERiFk', '102858', 'XIGpuL-Kkyk', '0y022OlZ3W0', 'IRY4D_-mx3Q', 'PboaYD5hlG8', 'c1a1y9ytHH0', 'Q4WzApjaCNI', 'eOiC1kb17P4', '243646', 'Baz0VVQ-E9E', 'Pt-D0LUHDSc', 'x2lOwQaAn4g', 'SHa756SwGJQ', 'Q7S0tX4FUNA', 'H1DpVwktfDQ', '5O1W39o56gg', 'unNY4zIk8MM', 'sPremsknoLM', 'QxU625Hn370', 'VbV9S4svrTg', 'rf0yDSeVEUA', 'vo_JbAFAD68', 'WxajvjGKz7Y', 'gVqs4TzqySw', '_aZDaIfGfPo', 'N5xfBtD6rLY', 'y9FyTEyGy5Y', '-hnBHBN8p5A', 'JPT38CA3sGI', 'tQ-CIfgj-Js', 'UlTJmndbGHM']

103 Names in test_stand_folder without segmented videos

['7l3BNtSE0xc', 'dZFV0lyedX4', 'DnBHq5I52LM', 'MYEyQUpMe3k', 'eFV7iFPYZB4', 'BRSyH6yfDLk', 'W1CWpktWtTs', 'SKTyBOhDX6U', 'tC2KicUHB9Q', 'PexNiFbPTYM', 'P17tYiqMGRU', 'UweZVaFqruU', 'lwL4hjhkid4', 'AgH84SNRx5s', 'eJfT7-dDqzA', 'JATMzuV6sUE', 'QnYlpSeVOYo', 'GXIfrEUJ5d4', '8i7u3fl-hP8', 'ZKErPftd--w', 'CbRexsp1HKw', 'yBtMwyQFXwA', '3wHE78v9zr4', 'cml9rShionM', 'kLAXmTx2xOA', 'nbru7qLot04', 'zhNksSReaQk', 'E1r0FrFyNTw', 'R9xTBw3MCWI', 'N0d2JL7JC1s', 'YUNxD04EvfE', '259470', 'pnpFPX34Agk', 'qAip3lZRj-g', 'gR3igiwaeyc', 'oHff2W51wZ8', 'qyqVc352g3Q', 'Kn5eKHlPD0k', 'DjcZrtcBZi4', '1HS2HcN2LDo', '5lrDS7LluCA', 'jPtaz1rN6lc', 'kI6jzM_aLGs', 'x8UZQkN52o4', '221153', 'a8UMRrUjavI', 'd-Uw_uZyUys', 'dHk--ExZbHs', 'V2X1NU5RkwY', 'fsBzpr4k3rY', 'eE8Qr9fOvVA', 'lkIe41StoGI', 'ydzNAuqUAnc', 'h1ZZHUU4j0k', 'wnL3ld9bM2o', '8wNr-NQImFg', 'JNhqI4JtPXA', 'lkeVfgI0eEk', 'lYwgLa4R5XQ', 'ZznoGQVwTtw', 'ZS1Nb0OWYNE', 'P0UHzR4CmYg', '93iGT5oueTA', '4dAYMzRyndc', 'zfZUOvZZTuk', 'fWAKek8jA5M', 'kXhJ3hHK9hQ', 'VVtx4IDsHZA', 'wznRBN1fWj4', 'VDkBM0ZG4q8', 'DzdPl68gV5o', 'AHiA9hohKr8', 'kmgsC68hIL8', 'XVWiAArXYpE', '6EDoVEm16fU', 'xXXcgb9eZ9Y', 'Y8dI1GTWCk4', 'U8VYG_g6yVE', 'ChhZna-aBK4', 'kg-W6-hP2Do', '9Lr4i7bIB6w', 'f8Puta8k8fU', 'WBA79Q3e_PU', 'U-KihZeIfKI', 'Wu-wQTmxRgo', 'V0SvSPkiJUY', '97ENTofrmNo', 'bkX5FOX22Tw', 'Rb1uzHNcYcA', 'MHVrwCEWLPI', 'cW-aX4dPVfk', 'VwGPIUNayKM', '5eY4S1F62Z4', 'VLQgw-88v4Q', 'gcFECfN4BCU', 'gLTxaEcx41E', 'ZcFzcd4ZoMg', '-ri04Z7vwnc', 'L-a4Sh6iAcw', 'aa0J1AXSseY', 'wO8fUOC4OSE', '_1nvuNk7EFY', 'PHZIx22aFhU']

siatwangmin avatar Sep 06 '20 06:09 siatwangmin

@siatwangmin This is all very interesting. Let me look into this in the coming days and weeks. I need to verify what has happened there. Maybe I am sharing a different video file in the raw, perhaps a longer version of the original one. This does not affect the features extracted because those are done very long ago. The raw videos were recompiled recently, so there may be the cause of this inconsistency. You can also check with segmented videos and see if the transcription matches the alignment. But I will look into this in more details.

A2Zadeh avatar Oct 11 '20 00:10 A2Zadeh

@A2Zadeh Hi how about the latest news?

siatwangmin avatar Feb 09 '21 03:02 siatwangmin

@siatwangmin Hi, have you solved the issues with raw data? I'm also trying to extract features (all visual/acoustic/text features) from CMU-MOSEI with self-trained feature extractors and could you help me to start it? I have difficulty getting labels (I can't match Raw/Labels with data) and choosing which one to use (ex. Raw/Videos/Full or Raw/Videos/Segmented).

sklee2014 avatar Mar 17 '21 02:03 sklee2014