BEVFormer
BEVFormer copied to clipboard
Question about TSA attention weight
Hi, thanks for your nice work!
I'm confuse by initialization of attention weights. They are all set to zero. https://github.com/fundamentalvision/BEVFormer/blob/66b65f3a1f58caf0507cb2a971b9c0e7f842376c/projects/mmdet3d_plugin/bevformer/modules/temporal_self_attention.py#L123
Why are the weights set to zero? Won't gradient vanishing happenn in the linear layer? Since in this way, the gradients of the weights are also be 0s during back propagation.