BEVFormer icon indicating copy to clipboard operation
BEVFormer copied to clipboard

Question about TSA attention weight

Open Angericky opened this issue 6 months ago • 0 comments

Hi, thanks for your nice work!

I'm confuse by initialization of attention weights. They are all set to zero. https://github.com/fundamentalvision/BEVFormer/blob/66b65f3a1f58caf0507cb2a971b9c0e7f842376c/projects/mmdet3d_plugin/bevformer/modules/temporal_self_attention.py#L123

Why are the weights set to zero? Won't gradient vanishing happenn in the linear layer? Since in this way, the gradients of the weights are also be 0s during back propagation.

Angericky avatar Aug 19 '24 14:08 Angericky