AVSegFormer
AVSegFormer copied to clipboard
Problem shape '[1029, 320, 32]' is invalid for input of size 1317120
When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in the first dimension. The specific problem appears in "out2 = self.cross_attn (query, src, src, key_padding_mask = padding_mask) [0]",it showing this error: File "/home/ptr/hzw/AVSegFormer-master/model/AVSegFormer.py", line 75, in forward pred, mask_feature = self.head(img_feat, audio_feat) File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ptr/hzw/AVSegFormer-master/model/head/AVSegHead.py", line 223, in forward memory, outputs = self.transformer(query, src_flatten, spatial_shapes, File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 160, in forward outputs = self.decoder(query, memory, reference_points, File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 139, in forward out = layer(out, src, reference_points, spatial_shapes, File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 117, in forward out2 = self.cross_attn( File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/functional.py", line 5044, in multi_head_attention_forward k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1) RuntimeError: shape '[1029, 320, 32]' is invalid for input of size 1317120