Multimodal-Transformer Modal dimensions (audio, video) for the MOSI are different from values reported in paper?

Modal dimensions (audio, video) for the MOSI are different from values reported in paper?

Open souravBhat opened this issue 3 years ago • 1 comments

Hi,

I am noticing discrepancy in model performance between runs on latest version of MOSI from the CMU multimodal SDK and the numbers reported in the paper. Upon digging further, turns out the dimensions of audio and video in the data provided with this repo are different (5 and 20 respectively) compared to the values reported in Appendix D of the paper (35 and 74 respectively). Could you please comment on the differences - am I missing something?

Sep 03 '21 22:09 souravBhat

Hi,

I am noticing discrepancy in model performance between runs on latest version of MOSI from the CMU multimodal SDK and the numbers reported in the paper. Upon digging further, turns out the dimensions of audio and video in the data provided with this repo are different (5 and 20 respectively) compared to the values reported in Appendix D of the paper (35 and 74 respectively). Could you please comment on the differences - am I missing something?

Can you share the data set you downloaded with me? Because I can't download the data set shared by the author

Sep 13 '21 03:09 yehaizhi

Multimodal-Transformer Multimodal-Transformer copied to clipboard

Modal dimensions (audio, video) for the MOSI are different from values reported in paper?

Multimodal-Transformer
Multimodal-Transformer copied to clipboard