Anonymous
Anonymous
loss_sum = loss_X1 + loss_X2 + 0.2 * loss_dcd The coefficient of loss_dcd is -0.2, is that right ?
class CrissCrossAttention(nn.Module): """ Criss-Cross Attention Module""" def __init__(self,in_dim): super(CrissCrossAttention,self).__init__() self.chanel_in = in_dim self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.key_conv = nn.Conv2d(in_channels = in_dim ,...
`class CC_module(nn.Module): def __init__(self,in_dim): super(CC_module, self).__init__() self.query_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) self.key_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) self.value_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1) self.softmax = Softmax(dim=3) self.INF = INF self.gamma = nn.Parameter(torch.zeros(1))...
Hi I troubled with loss is NAN(first epoch loss is 5021 ,it is too large).Can you give me some advices? Thanks very much.
Thanks for your sharing. I found the code just calculate the positive loss. I am confused with where is the negative pair loss calculation? Thanks for your reply.
Thanks for your sharing. Did all of experiments distillation work with the CE loss? I have a problem about this training strategy. First , well trained teacher model fixed parameters,...
感谢对社区的贡献,请问什么时候release speaker label? 谢谢