CCNet Why self.gamma is intialized to zero

class CrissCrossAttention(nn.Module): """ Criss-Cross Attention Module""" def init(self,in_dim): super(CrissCrossAttention,self).init() self.chanel_in = in_dim

    self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1)
    self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1)
    self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1)
    self.gamma = nn.Parameter(torch.zeros(1))

def forward(self, x):
    proj_query = self.query_conv(x)
    proj_key = self.key_conv(x)
    proj_value = self.value_conv(x)

    energy = ca_weight(proj_query, proj_key)
    attention = F.softmax(energy, 1)
    out = ca_map(attention, proj_value)
    out = self.gamma*out + x
    return out

Thanks for your reply. I am confused with self.gamma is zero?

Nov 19 '20 14:11 XiXiRuPan

@XiXiRuPan The gamma is initialized to zero makes the RCCA module an identity connection at the beginning of training. It could make the training smoothly. The gamma is a learnable parameter, with the training of the network, the gamma will be set to a value, and RCCA starts to work.

Nov 20 '20 14:11 speedinghzl

@XiXiRuPan The gamma is initialized to zero makes the RCCA module an identity connection at the beginning of training. It could make the training smoothly. The gamma is a learnable parameter, with the training of the network, the gamma will be set to a value, and RCCA starts to work.

If there is no gamma, will the accuracy be affected?

Jun 06 '22 03:06 littlesaohuo