Shitao Xiao comments

Results 509 comments of


                                            Shitao Xiao

How to choose value for train_group_size in finetuning?

Hi, @chansonzhang Q1: - We will random sample train_group_size-1 negatives from "neg":List[str] - All passages in the same batch (except the positive) will be used as negatives For example, a...

在使用FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py评估bge-m3时如何选择dense、sparse和colbert

FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py目前只支持dense。要测试混合模式，参考https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MKQA

使用deepspeed训练后保存模型出现size mismatch

可以试试stage 0或者1，large级别的模型不需要开stage3。

空字符串和其他字符串的相似度都有0.5以上？

参考FAQ-2: https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding#frequently-asked-questions

@wwz0123 ，您好，排序关注的相似度大小关系，与绝对值无关，参考之前的回答：FAQ：The similarity score between two dissimilar sentences is higher than 0.5 。 ![image](https://github.com/FlagOpen/FlagEmbedding/assets/29973524/0fd54401-7737-4224-bb40-193bf6786a43) 使用bge-v1.5和bge-m3的相似度分布会更均匀一些。另外，空字符串是有字符输出的，这是正常的。

空字符串和其他字符串的相似度都有0.5以上？

@TChengZ 第一行[0.8384 0.7036]是"样例数据-1"对sentences_2的相似度，第二行是"样例数据-2"对sentences_2的相似度。

空字符串和其他字符串的相似度都有0.5以上？

> > @TChengZ 第一行[0.8384 0.7036]是"样例数据-1"对sentences_2的相似度，第二行是"样例数据-2"对sentences_2的相似度。 > > 另外再咨询下，faq里直接 > > ``` > similarity = embeddings_1 @ embeddings_2.T > ``` > > 这个相似度计算方式和我自己调用余弦cosine计算是一样的吗是的。

空字符串和其他字符串的相似度都有0.5以上？

> ``` > # -*- coding: utf-8 -*- > from FlagEmbedding import FlagModel > model = FlagModel('/xxx/bge-m3', > query_instruction_for_retrieval="答案比较", > use_fp16=True) # Setting use_fp16 to True speeds up computation with...

bge-reranker-v2-minicpm-layerwise的评分很低，经常为负数

'query', 'passage'本身就不太相关，低是正常的

跑测试集遇到个问题，是我的用法不对吗？

可以试试降低mteb库的版本