SimCSE
                                
                                
                                
                                    SimCSE copied to clipboard
                            
                            
                            
                        Question: When inference hint:newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
I used the latest release of the version, and use  the  run_sup_example.sh to train my datasets.
But when I used SimCSE to load the model I got this hint:
Use  code :
embedder = SimCSE(model_name_or_path=model_name)
/SimCSE/result/my-sup-simcse-bert-base-uncased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
And more importantly,that leads to a different result.
In addition, I also try to use the model of your release directly, without any problem, and the result was the same every time. ex:princeton-nlp/sup-simcse-roberta-base
In models.py
# line 281
        self.bert = BertModel(config, add_pooling_layer=False)
# line 340
        self.roberta = RobertaModel(config, add_pooling_layer=False)
So trian with this,there is no pooling_layey, but with  "cls_before_pooler"
However,it still use the Pooler layer in tool.py
                    # line 40
                   elif "unsup" in model_name_or_path:   # change "unsup" -> "sup" is work
                       logger.info("Use `cls_before_pooler` for unsupervised models. If you want to use other pooling policy, specify `pooler` argument.")
                       self.pooler = "cls_before_pooler"
                   else: 
                        self.pooler = "cls"
                 # line 76
                 if self.pooler == "cls":
                   # In models.py  not use pooler, so when use this   we will get newly initialized.
                    embeddings = outputs.pooler_output 
                elif self.pooler == "cls_before_pooler":
                    embeddings = outputs.last_hidden_state[:, 0]
                else:
                    raise NotImplementedError
                if normalize_to_unit:
                    embeddings = embeddings / embeddings.norm(dim=1, keepdim=True)
                embedding_list.append(embeddings.cpu())
                                    
                                    
                                    
                                
I used the latest release of the version, and use the
run_sup_example.shto train my datasets. But when I used SimCSE to load the model I got this hint: Use code :embedder = SimCSE(model_name_or_path=model_name)/SimCSE/result/my-sup-simcse-bert-base-uncased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
And more importantly,that leads to a different result.
In addition, I also try to use the model of your release directly, without any problem, and the result was the same every time. ex:princeton-nlp/sup-simcse-roberta-base
遇到了同样的问题,请问你解决了吗?
解决了,参考一下上面的回复。由于训练的时候设置了add_pooling_layer=False,而tool.py中在有监督的情况下仍然使用了这一层参数,因此推理的时候随机初始化导致推理结果每次都不一样的情况。 所以修改方式两种,
- 如果自己微调的话,修改训练模型里面的add_pooling_layer=True 仍然使用pooling_layer.
 - 如果不使用pooling_layer的话 修改tool.py使用hidden_state (self.pooler = "cls_before_pooler"),而不是使用cls.
elif "unsup" in model_name_or_path: # change "unsup" -> "sup" is work 
@zhiqiangohuo thanks for the clarification! We also have relevant instructions in README.