wangzhao88 comments

Repositories
Issues
Comments

Results 2 comments of


                                            wangzhao88

C-MTP（labeled）数据咨询

''' 谢谢！同时抱歉，论文中有些勘误，我们后期会修改。我确认了一下，最后的数据包括t2ranking, dulreader, mmarco, cmedqav2, mulit-cpr, nli-zh, ocmnli, cmnli全量的训练数据，通过text2vec进行了简单的过滤，用bge对t2ranking, dulreader, mmarco挖掘了难负样本，nli的数据使用label=0的为负样本，训练时train_group_size=2, 训了5个epoch。 ''' 您好，我正在复现bge的训练过程，这一部分有些细节想请教一下。C-MTP-labeled使用text2vec过滤的时候，是使用GanymedeNil/text2vec-large-chinese这个模型并且阈值设置为0.43，过滤之后再使用剩下样本中label=0的样本作为特定sentence1的负样本，不知道我理解的是否正确，谢谢。

Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint

https://github.com/EleutherAI/lm-evaluation-harness