emotion2vec icon indicating copy to clipboard operation
emotion2vec copied to clipboard

The Performance of the new models are bad for specific languages

Open Respaired opened this issue 1 year ago • 3 comments

Thank you for creating e2v. how can i access the previous model that could only output a few labels instead of 9? I find this new ckpt (the plus large) to be so much worse compared to the old one at least for Persian.

the model also hallucinates a lot with short inputs (1-2 seconds) even in English.

Respaired avatar Jul 30 '24 17:07 Respaired

You can modify the logits to specific emotions(such as 5) by masking the emotions you don't need. You will get similar performance with the previous model.

ddlBoJack avatar Aug 02 '24 10:08 ddlBoJack

If I use the feature vectors ('feats') generated by the Automodel library's model.generate function on audio files as input to train a new model for Speech emotion recognition, is this process equivalent to fine-tuning or training a downstream model for speech emotion recogniton ? Are these features equivalent to embeddings or raw audio features ?

buanide avatar Aug 08 '24 08:08 buanide

If I use the feature vectors ('feats') generated by the Automodel library's model.generate function on audio files as input to train a new model for Speech emotion recognition, is this process equivalent to fine-tuning or training a downstream model for speech emotion recogniton ? Are these features equivalent to embeddings or raw audio features ?

I did not get your idea clearly. We provide emotion2vec for extracting features and emotion2vec+ for classification. And both types of the model provide embeddings for further exploration of your tasks.

ddlBoJack avatar Aug 28 '24 10:08 ddlBoJack