FlagEmbedding
FlagEmbedding copied to clipboard
负样本挖掘时的选择范围
您好,想再问一下挖掘负样本时选择的范围
多语言版本的msmarco数据中,https://microsoft.github.io/msmarco/, 一个问题对应一个负样本
通过hn_mine.py挖掘更多负样本时,范围是从input_file的已有neg中选择的吗?还是其他呢? 另外,candidate_pool可以设置为语料库(msmarco给的collections)中排除测试集的文档吗? 谢谢
Originally posted by @AugustLHHHHHH in https://github.com/FlagOpen/FlagEmbedding/issues/785#issuecomment-2112418090
@AugustLHHHHHH , we mined hard negatives from the entire corpus of msmarco.
@AugustLHHHHHH , we mined hard negatives from the entire corpus of msmarco.
Thanks!