emotion2vec The Performance of the new models are bad for specific languages

Thank you for creating e2v. how can i access the previous model that could only output a few labels instead of 9? I find this new ckpt (the plus large) to be so much worse compared to the old one at least for Persian.

the model also hallucinates a lot with short inputs (1-2 seconds) even in English.

Jul 30 '24 17:07 Respaired

You can modify the logits to specific emotions(such as 5) by masking the emotions you don't need. You will get similar performance with the previous model.

Aug 02 '24 10:08 ddlBoJack

If I use the feature vectors ('feats') generated by the Automodel library's model.generate function on audio files as input to train a new model for Speech emotion recognition, is this process equivalent to fine-tuning or training a downstream model for speech emotion recogniton ? Are these features equivalent to embeddings or raw audio features ?

Aug 08 '24 08:08 buanide

If I use the feature vectors ('feats') generated by the Automodel library's model.generate function on audio files as input to train a new model for Speech emotion recognition, is this process equivalent to fine-tuning or training a downstream model for speech emotion recogniton ? Are these features equivalent to embeddings or raw audio features ?

I did not get your idea clearly. We provide emotion2vec for extracting features and emotion2vec+ for classification. And both types of the model provide embeddings for further exploration of your tasks.

Aug 28 '24 10:08 ddlBoJack