BEVFormer Question about TSA attention weight

Question about TSA attention weight

Open Angericky opened this issue 6 months ago • 0 comments

Hi, thanks for your nice work!

I'm confuse by initialization of attention weights. They are all set to zero. https://github.com/fundamentalvision/BEVFormer/blob/66b65f3a1f58caf0507cb2a971b9c0e7f842376c/projects/mmdet3d_plugin/bevformer/modules/temporal_self_attention.py#L123

Why are the weights set to zero? Won't gradient vanishing happenn in the linear layer? Since in this way, the gradients of the weights are also be 0s during back propagation.

Aug 19 '24 14:08 Angericky

BEVFormer BEVFormer copied to clipboard

Question about TSA attention weight

BEVFormer
BEVFormer copied to clipboard