Multimodal-Transformer
Multimodal-Transformer copied to clipboard
Modal dimensions (audio, video) for the MOSI are different from values reported in paper?
Hi,
I am noticing discrepancy in model performance between runs on latest version of MOSI from the CMU multimodal SDK and the numbers reported in the paper. Upon digging further, turns out the dimensions of audio and video in the data provided with this repo are different (5 and 20 respectively) compared to the values reported in Appendix D of the paper (35 and 74 respectively). Could you please comment on the differences - am I missing something?
Hi,
I am noticing discrepancy in model performance between runs on latest version of MOSI from the CMU multimodal SDK and the numbers reported in the paper. Upon digging further, turns out the dimensions of audio and video in the data provided with this repo are different (5 and 20 respectively) compared to the values reported in Appendix D of the paper (35 and 74 respectively). Could you please comment on the differences - am I missing something?
Hi,
I am noticing discrepancy in model performance between runs on latest version of MOSI from the CMU multimodal SDK and the numbers reported in the paper. Upon digging further, turns out the dimensions of audio and video in the data provided with this repo are different (5 and 20 respectively) compared to the values reported in Appendix D of the paper (35 and 74 respectively). Could you please comment on the differences - am I missing something?
Can you share the data set you downloaded with me? Because I can't download the data set shared by the author