Shitao Xiao comments

Results 509 comments of


                                            Shitao Xiao

bge-m3 fine-tune

建议核查一下data/models/bge-m3里面文件是否有缺失，我们这边运行时没有问题的

使用bge m3做文档问答

是的，m3模型不需要添加指令。

有没有docker cpu部署加速的例子

可以参考https://github.com/huggingface/text-embeddings-inference

FlagEmbedding and LlamaIndex

Hi, thanks for your interest in our work. You can use HuggingfaceEmbedding to load bge model in LlamaIndex. And LlamaIndex has its training script that you can use. If you...

BGE-M3 compute_score function is very inefficient

Thanks for your interest in our work! `compute_score ` is an example to compute the hybrid scores. If you have a better implementation, welcome to submit a PR. If you...

SafetensorError: Error while deserializing header: HeaderTooSmall

A possible issue is the old version of transformers. You can try to upgrade the transformers.

MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31

The error seems to be related to your device. A smaller batch size may be helpful (set a smaller batch size in https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/flag_dres_model.py#L16)

相似度分布变化的问题

负样本中存在伪负例或者和正样本太相似，会导致整体的分数下降。但只要保持正样本分数比负样本高就行。

相似度分布变化的问题

> 额外提供一个实验中的信息，是否启用normalized参数对于模型的输出分布有很大的影响， normalized会归一化向量，使得最终计算的是余弦相似度，范围在[-1, 1]。如果设置为False，使用向量内积计算相似度，而向量内积是没有范围分布的。

bge-m3中3种混合检索的方法用什么向量数据库可以支持呢？

目前vespa支持比较好：https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb