SimCSE-Chinese-Pytorch 继续训练效果

继续训练效果

Open prettyprettyboy opened this issue 1 year ago • 2 comments

您好，关于之前预测时不能得到标签的问题，除了取阈值，我采取的方式为，先用在snli上有监督simcse训练得到checkpoint，计算spearman系数和您表格中的效果差不多。然后在simcse上再加上一层mlp在snli上微调，类似于如下形式：

class SimCSE_with_mlp(nn.Module):
    def __init__(self, SimCSE_model):
        super(SimCSE_with_mlp, self).__init__()
        self.SimCSE = SimCSE_model
        self.linear = nn.Linear(2*768,3)
    def forward(self, input_ids1, attention_mask1, token_type_ids1, input_ids2, attention_mask2, token_type_ids2):
        output1 = self.SimCSE(input_ids1, attention_mask1, token_type_ids1)
        output2 = self.SimCSE(input_ids2, attention_mask2, token_type_ids2)
        output = torch.cat([output1,output2],dim=1)
        output_score = self.linear(output)
        return output_score

结果得到的F1为0.68，远远低于直接[cls]sentence1[sep]sentence2[sep]在bert上分类的效果，不知道这是什么原因呢

Aug 01 '22 02:08 prettyprettyboy

没有做过这类的实验，谈一下我的理解吧，不一定对。我理解simcse的思路是为了让分布更加均匀，这里最后分成3类又会约束分布，要这样做的话，可以把simcse的参数冻住，只训练mlp的参数试试。然后[cls]sentence1[sep]sentence2[sep] 本身也是一种思路。可以参考一下苏神的文章：苏剑林. (Jan. 06, 2022). 《CoSENT（一）：比Sentence-BERT更有效的句向量方案》[Blog post]. Retrieved from https://spaces.ac.cn/archives/8847

Aug 01 '22 02:08 vdogmcgee

没有做过这类的实验，谈一下我的理解吧，不一定对。我理解simcse的思路是为了让分布更加均匀，这里最后分成3类又会约束分布，要这样做的话，可以把simcse的参数冻住，只训练mlp的参数试试。然后[cls]sentence1[sep]sentence2[sep] 本身也是一种思路。可以参考一下苏神的文章：苏剑林. (Jan. 06, 2022). 《CoSENT（一）：比Sentence-BERT更有效的句向量方案》[Blog post]. Retrieved from https://spaces.ac.cn/archives/8847

其实[cls]sentence1[sep]sentence2[sep]这已经是基于交互式的模型了，模型能学到两个语句的交互信息。而simcse、cosent、whitening这些其实都没有用到交互信息，都是单句话直接进模型出embedding的。所以[cls]sentence1[sep]sentence2[sep]这种效果高几个点是非常正常的结果

Oct 16 '23 11:10 shexuan

SimCSE-Chinese-Pytorch SimCSE-Chinese-Pytorch copied to clipboard

继续训练效果

SimCSE-Chinese-Pytorch
SimCSE-Chinese-Pytorch copied to clipboard