multimodal-speech-emotion
multimodal-speech-emotion copied to clipboard
Do you have any idea on why attention-based model worse than the rnn model ?
Hi, I see that the attention-based model performs worse than the rnn-based model in the original paper. Do you have an idea why ?
I also read "SPEECH EMOTION RECOGNITION USING MULTI-HOP ATTENTION MECHANISM", where you suggested that the MHA-1 based on the attention mechanism gives a better performance. However, I see there is almost no difference bettween MHA-1 and the attention model you proposed here except that you used Bi-LSTM in the latter one.
I noticed that you used a different feature extractor ( opensmile vs. Kaldi). Does this make the differences ?
Hi, Your observation is correct. The attention mechanism in MDREA and MHA-1 is the same. There is an implementation issue in the MDREA model. In particular, a sequence mask should be applied before applying the softmax function when computing the attention weight.
Hi, Your observation is correct. The attention mechanism in MDREA and MHA-1 is the same. There is an implementation issue in the MDREA model. In particular, a sequence mask should be applied before applying the softmax function when computing the attention weight.
Thanks for your reply. I did add a sequence mask before applying the softmax function when computing the attention weight, that is the pad unwanted parts to -inf, However, I still did not get a performance better than the MDRE, Do you have any hints for it ?
I think the MDREA should perform better than MDRE. Let me share an "attention code" from another repository. The attention function in this code requires an additional argument "batch_seq" that indicates the valid sequence of input data; it tells the model where to mask.
https://github.com/david-yoon/attentive-modality-hopping-for-SER/blob/master/util/model_luong_attention.py
I think the MDREA should perform better than MDRE. Let me share an "attention code" from another repository. The attention function in this code requires an additional argument "batch_seq" that indicates the valid sequence of input data; it tells the model where to mask.
https://github.com/david-yoon/attentive-modality-hopping-for-SER/blob/master/util/model_luong_attention.py
Thanks for sharing the implemented luong_attention model, https://github.com/david-yoon/attentive-modality-hopping-for-SER/blob/ba8d010b493267bc004412c688dad85e135a60ba/util/model_luong_attention.py#L21, I tried this one, But the MDRE still outperform the MDREA model.
I think the prosody is unfit for MHA_1!