MMSA
MMSA copied to clipboard
How MOSI's audio features are obtained
I noticed that the audio feature length of the MOSI dataset is 5. May I ask how the audio features are extracted for the MOSI dataset.