FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

teacher score

Open drewskidang opened this issue 1 year ago • 4 comments

I have a train dataset with query,pos,neg. Is there a script to include knowledge distulation for scoring pos and negs?

drewskidang avatar Jul 07 '24 18:07 drewskidang

same question,bge embedding btw

malongfei1993 avatar Jul 09 '24 02:07 malongfei1993

FlagEmbedding\baai_general_embedding\finetune\data.py 73line def padding_score(self, teacher_score): group_size = None for scores in teacher_score: if scores is not None: group_size = len(scores) break if group_size is None: return None

    padding_scores = [100.0] + [0.0] * (group_size - 1)
    new_teacher_score = []
    for scores in teacher_score:
        if scores is None:
            new_teacher_score.append(padding_scores)
        else:
            new_teacher_score.append(scores)
    return new_teacher_score

malongfei1993 avatar Jul 09 '24 02:07 malongfei1993

You can use bge-reranker-v2 to compute scores for pos and neg, and use bge-m3 script to fine-tune models via distillation: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/unified_finetune#2-data-format

staoxiao avatar Jul 10 '24 08:07 staoxiao

FlagEmbedding\baai_general_embedding\finetune\data.py 73line def padding_score(self, teacher_score): group_size = None for scores in teacher_score: if scores is not None: group_size = len(scores) break if group_size is None: return None

    padding_scores = [100.0] + [0.0] * (group_size - 1)
    new_teacher_score = []
    for scores in teacher_score:
        if scores is None:
            new_teacher_score.append(padding_scores)
        else:
            new_teacher_score.append(scores)
    return new_teacher_score

这里并没有给bge embedding 完善这个功能对吗?我没有找到继续的代码。 看到m3是支持的

liuslab avatar Aug 16 '24 08:08 liuslab