FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

负样本挖掘时的选择范围

Open AugustLHHHHHH opened this issue 1 year ago • 2 comments

您好,想再问一下挖掘负样本时选择的范围

多语言版本的msmarco数据中,https://microsoft.github.io/msmarco/, 一个问题对应一个负样本 image

通过hn_mine.py挖掘更多负样本时,范围是从input_file的已有neg中选择的吗?还是其他呢? 另外,candidate_pool可以设置为语料库(msmarco给的collections)中排除测试集的文档吗? 谢谢

Originally posted by @AugustLHHHHHH in https://github.com/FlagOpen/FlagEmbedding/issues/785#issuecomment-2112418090

AugustLHHHHHH avatar May 15 '24 12:05 AugustLHHHHHH

@AugustLHHHHHH , we mined hard negatives from the entire corpus of msmarco.

staoxiao avatar May 15 '24 17:05 staoxiao

@AugustLHHHHHH , we mined hard negatives from the entire corpus of msmarco.

Thanks!

AugustLHHHHHH avatar May 16 '24 07:05 AugustLHHHHHH