axial-deeplab
axial-deeplab copied to clipboard
why batchnormalization after qkv transform?
I wonder why batchnormalization after qkv transform? is it because of the covariate shift issue?
https://github.com/csrhddlam/axial-deeplab/blob/79088edb4bdb8c94351d85f54272ec12b9e79c8b/lib/models/axialnet.py#L31-L34
How does batchnorm2D
work for calculating the similarity score? It really confused me.
Thanks