transformer-cnn-emotion-recognition icon indicating copy to clipboard operation
transformer-cnn-emotion-recognition copied to clipboard

RuntimeError: mat1 dim 1 must match mat2 dim 0

Open jvel07 opened this issue 3 years ago • 2 comments

Hi, congrats on the project! I am getting this error, could not figure out what's going on:

File "/media/user/hk-data/PycharmProjects/dnn_embeddings_pytorch/train_model.py", line 148, in train_model output = net(x_train) File "/home/user/anaconda3/envs/general_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/media/user/hk-data/PycharmProjects/dnn_embeddings_pytorch/dnn_models.py", line 240, in forward output_logits = self.fc1_linear(complete_embedding) File "/home/user/anaconda3/envs/general_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/anaconda3/envs/general_py37/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/user/anaconda3/envs/general_py37/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: mat1 dim 1 must match mat2 dim 0

jvel07 avatar Mar 03 '21 11:03 jvel07

I actually found out why. It is due to the shape of my MFCCs. The shape I have is (1, 40, 498). In your case, it is (1, 40, 282). How could I adapt to the shape of my MFCCs?

jvel07 avatar Mar 03 '21 12:03 jvel07

Well done, I encountered a similar problem in changing the shape of the MFCCs input.

To change the shape of the MFCCs, you can change the parameters of the mel spectrogram used in the librosa.feature.mfcc() call (called from the get_features() in this project). The relevant parameters in librosa.feature.melspectrogram() are n_fft (length of fft window), hop_length (#samples between each audio frame), and win_length (size of each audio frame).

If you're using the RAVDESS dataset, use the exact parameters as in this repo and you will get (1, 40, 282). If you are using your own dataset, play with the above parameters and see if you can get (1,40,282). I would start by changing win_length parameter to a larger value.

Finally, you can always change the CNN input shape by changing the input channel size on just the first conv layer. Then you can adjust the maxpool kernel (e.g. double its size) so that you don't have to change following conv layers.

On Wed, Mar 3, 2021 at 7:40 AM José Vicente Egas-López < [email protected]> wrote:

I actually found out why. It is due to the shape of my MFCCs. The shape I have is (1, 40, 498). In your case, it is (1, 40, 282). How could I adapt to the shape of my MFCCs?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IliaZenkov/transformer-cnn-emotion-recognition/issues/4#issuecomment-789686625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEL4JODRJODMBR3LCE2IW6TTBYU37ANCNFSM4YRBYLTQ .

IliaZenkov avatar Mar 03 '21 15:03 IliaZenkov