multimodal-speech-emotion Do you have any idea on why attention-based model worse than the rnn model ?

Hi, I see that the attention-based model performs worse than the rnn-based model in the original paper. Do you have an idea why ?

I also read "SPEECH EMOTION RECOGNITION USING MULTI-HOP ATTENTION MECHANISM", where you suggested that the MHA-1 based on the attention mechanism gives a better performance. However, I see there is almost no difference bettween MHA-1 and the attention model you proposed here except that you used Bi-LSTM in the latter one.

I noticed that you used a different feature extractor ( opensmile vs. Kaldi). Does this make the differences ?

Mar 30 '20 09:03 WaNePr

Hi, Your observation is correct. The attention mechanism in MDREA and MHA-1 is the same. There is an implementation issue in the MDREA model. In particular, a sequence mask should be applied before applying the softmax function when computing the attention weight.

Apr 07 '20 13:04 david-yoon

Hi, Your observation is correct. The attention mechanism in MDREA and MHA-1 is the same. There is an implementation issue in the MDREA model. In particular, a sequence mask should be applied before applying the softmax function when computing the attention weight.

Thanks for your reply. I did add a sequence mask before applying the softmax function when computing the attention weight, that is the pad unwanted parts to -inf, However, I still did not get a performance better than the MDRE, Do you have any hints for it ?

Apr 08 '20 08:04 WaNePr

I think the MDREA should perform better than MDRE. Let me share an "attention code" from another repository. The attention function in this code requires an additional argument "batch_seq" that indicates the valid sequence of input data; it tells the model where to mask.

https://github.com/david-yoon/attentive-modality-hopping-for-SER/blob/master/util/model_luong_attention.py

Apr 09 '20 01:04 david-yoon

I think the MDREA should perform better than MDRE. Let me share an "attention code" from another repository. The attention function in this code requires an additional argument "batch_seq" that indicates the valid sequence of input data; it tells the model where to mask.

https://github.com/david-yoon/attentive-modality-hopping-for-SER/blob/master/util/model_luong_attention.py

Thanks for sharing the implemented luong_attention model, https://github.com/david-yoon/attentive-modality-hopping-for-SER/blob/ba8d010b493267bc004412c688dad85e135a60ba/util/model_luong_attention.py#L21, I tried this one, But the MDRE still outperform the MDREA model.

Apr 09 '20 02:04 WaNePr

I think the prosody is unfit for MHA_1!

May 15 '20 12:05 SunshlnW

multimodal-speech-emotion multimodal-speech-emotion copied to clipboard

Do you have any idea on why attention-based model worse than the rnn model ?

multimodal-speech-emotion
multimodal-speech-emotion copied to clipboard