Why self.gamma is intialized to zero
class CrissCrossAttention(nn.Module): """ Criss-Cross Attention Module""" def init(self,in_dim): super(CrissCrossAttention,self).init() self.chanel_in = in_dim
self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1)
self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1)
self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1)
self.gamma = nn.Parameter(torch.zeros(1))
def forward(self, x):
proj_query = self.query_conv(x)
proj_key = self.key_conv(x)
proj_value = self.value_conv(x)
energy = ca_weight(proj_query, proj_key)
attention = F.softmax(energy, 1)
out = ca_map(attention, proj_value)
out = self.gamma*out + x
return out
Thanks for your reply. I am confused with self.gamma is zero?
@XiXiRuPan The gamma is initialized to zero makes the RCCA module an identity connection at the beginning of training. It could make the training smoothly. The gamma is a learnable parameter, with the training of the network, the gamma will be set to a value, and RCCA starts to work.
@XiXiRuPan The gamma is initialized to zero makes the RCCA module an identity connection at the beginning of training. It could make the training smoothly. The gamma is a learnable parameter, with the training of the network, the gamma will be set to a value, and RCCA starts to work.
If there is no gamma, will the accuracy be affected?