MultiBench Extracting info from the H5 files

Hello,

I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.

I would be recomputing the audio embeddings.

So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.

How would I go about extracting this information?

Thanks,

Ed

Aug 08 '23 10:08 mirix

Ok, perhaps I am getting to something:

import h5py
import numpy as np
import pandas as pd

filename = '/home/emoman/Downloads/mosei/CMU_MOSEI_Labels.csd'

hf = h5py.File(filename)

features = hf.get('All Labels/data/zv0Jl4TIQDc/features')
feat = np.array(features)
df_feat = pd.DataFrame(feat)
print(df_feat)

intervals = hf.get('All Labels/data/zv0Jl4TIQDc/intervals')
intval = np.array(intervals)
df_intval = pd.DataFrame(intval)
print(df_intval)

This gives:

          0         1    2         3    4    5    6
0  0.333333  0.666667  0.0  0.666667  0.0  0.0  0.0
1  1.000000  2.000000  0.0  0.000000  0.0  0.0  0.0
2  2.333333  2.666667  0.0  0.000000  0.0  0.0  0.0
        0       1
0  56.852  60.845
1  29.764  35.633
2  42.146  49.242

My interpretation is that video zv0Jl4TIQDc has three intervals annotated with the relative weights of Ekman's basic emotions.

Is that correct?

If that is the case, what would be the mapping of the emotions?

What is the highest possible value for a given emotion?

Aug 08 '23 11:08 mirix

Each sentence is annotated for sentiment on a [-3,3]
Likert scale of: [−3: highly negative, −2 negative,
−1 weakly negative, 0 neutral, +1 weakly positive,
+2 positive, +3 highly positive]. Ekman emotions
(Ekman et al., 1980) of {happiness, sadness, anger,
fear, disgust, surprise} are annotated on a [0,3] Lik-
ert scale for presence of emotion x: [0: no evidence
of x, 1: weakly x, 2: x, 3: highly x].

So column zero is the Likert score and then the other columns would be, in this order, {happiness, sadness, anger, fear, disgust, surprise} ?

Aug 08 '23 11:08 mirix

The issue with this interpretation is that segment 0 above would have been labelled with happiness and anger in similar amounts...

Aug 08 '23 11:08 mirix

Or is it (Anger Disgust Fear Happy Sad Surprise) as in Table 3?

Then it would be Anger and Fear, which is more consistent, but the sentiment would be slightly positive...

Aug 08 '23 11:08 mirix

Checking the entries with the most negative and positive sentiment, it seems to be {happiness, sadness, anger, fear, disgust, surprise}

Aug 08 '23 13:08 mirix

I have forked MOSEI to build a unimodal SER dataset:

https://github.com/mirix/messaih/tree/main

Aug 10 '23 12:08 mirix