AVSegFormer issues

einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (1) for operand 1 and no ellipsis was given

File "/home/hwh/Project/AVS/AVSegFormer-master/model/head/AVSegHead.py", line 238, in forward mask_feature = self.fusion_block(mask_feature, audio_feat) File "/home/hwh/anaconda3/envs/AVS39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/hwh/Project/AVS/AVSegFormer-master/model/utils/fusion_block.py", line 44, in forward fusion_map = torch.einsum('bchw,bc->bchw', feature_map, x.squeeze())...

Huangwenhu1

There is a problem with the reproduction results

Hello, this model is on the S4 data set, image size (224, 224), but the reproducible result is only 0.734. I did not modify the configuration file.

ButlerZOH

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120

When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in...

KOLE-LE

Question about the AVSS pre-training

11

When training the model on the AVSS Datasets, we find that the MIOU is about 20 with Res50 backbone and is about 30 with PVT-v2 backbone at 11 epochs. Could...

SitongGong