Audio Feature Extraction

Open xiaoxinchaoren56 opened this issue 1 year ago • 0 comments

I noticed that the length of audio features and visual features in the MOSI dataset are different from those in MOSEI, may I ask what tool was used to extract the audio and visual features in the MOSI dataset. Especially, the length of audio features in MOSI dataset is 5, may I know how to get it, is it also extracted by using COVERAP. The length of the visual feature is 20, was it also extracted using Facet. Looking forward to your reply.

Sep 23 '24 12:09 xiaoxinchaoren56