FlagEmbedding issues

新手遇到问题留言一下！出现“Model name 'model_train' not found in the model mapping”的解决方法

1

就是在微调完embedding模型之后，我想利用自己准备的数据来评估一下模型效果，但是输入模型权重路径之后一直显示“Model name 'model_train' not found in the model mapping”错误。然后查了很久，发现是：在FlagEmbedding 的设计里，它需要同时知道模型类型和权重路径：模型类型（例如 bge-m3、bge-base-zh 等）告诉框架要用哪个类来实例化模型（BGEM3FlagModel 等）。权重路径告诉框架用你保存的 checkpoint 来初始化模型参数。框架在读取 --embedder_name_or_path 时，会把最后一层目录名当作模型名去 AUTO_EMBEDDER_MAPPING 查找。如果目录名是 checkpoint-2110，框架就会把 "checkpoint-2110" 当作模型名去找，但映射里没有这个名字，所以报错。所以我们还需要加上一句...

Alanble

Colbert Interaction correct?

4

Thank you very much for BGE-M3! I am implementing something similar, i found a line in your code that puzzles me a bit: https://github.com/FlagOpen/FlagEmbedding/blob/2225aacb54cf9e807aa116dfffeb0cceb291b38b/FlagEmbedding/finetune/embedder/encoder_only/m3/modeling.py#L227 might it be that the colbert...

approximated-intelligence

Embedding similarity scores stuck around 0.541 when using BGE-Code-V1 as retriever in RAG

When I use BGE-Code-V1 (Qwen2.5-Coder-1.5B based) as retriever in my RAG pipeline, I find that query–chunk similarity scores are always around ~0.541, regardless of the query and document content. Task:...

OpEnD17

BGE-Code-v1的训练数据有开源吗

1

jiangh0

if you fine tune Mistral or only use smart prompting ?

2

great code can you share , if you fine tune Mistral or only use smart prompting ?

Sandy4321

why is there no within training eval for m3 or any encoder?

The encoder trainers all appear to be train only, which seems really odd to me. Please explain the design choice to not have eval during training. It seems very standard.

jfelectron

这里的代码是不是有问题？deocder和cls模式都去的是序号0的embedding。

1

def _pool(self, embeddings, attention_mask): if "mean" in self.pooling_method: embeddings = embeddings.masked_fill( ~attention_mask[..., None].bool(), 0.0) embedding = embeddings.sum( dim=1) / attention_mask.sum(dim=1, keepdim=True) elif "cls" in self.pooling_method: embedding = embeddings[:, 0] elif...

iamreallyi9

空字符串和其他字符串的相似度都有0.5以上？

13

为什么空字符串和其他字符串的相似度都有0.5以上？ ![image](https://github.com/FlagOpen/FlagEmbedding/assets/161291221/86732cab-39e2-42d9-8201-a61f2ce623c4) 相似度为： ![image](https://github.com/FlagOpen/FlagEmbedding/assets/161291221/12b916ff-4c7e-4e50-abd2-dd9fb4482767) 打印出了空字符的向量： ![image](https://github.com/FlagOpen/FlagEmbedding/assets/161291221/4957c801-95d6-46c0-b147-1ce0422d9958)

wwz0123

The results obtained by get text embedding batch and get text embedding are different

1

I found that the vector results of Hangzhou City obtained by these two methods are different. What is the reason for this code: ` model = HuggingFaceEmbedding( model_name='/home/nepf/hwd/bge-m3/', device="cpu", )...

hanweidong

bge-m3和bge-rerank-v2-m3按语种裁剪

你好！非常感谢你们开源这两个业界最强的多语种模型，支持190+语种。我现在想将语种缩减到7个，分别是中英日韩西法阿。想请教下应该怎么做？我的一些想法： 1.将中英日韩西法阿这7个语种的数据在bge-m3和bge-rerank-v2-m3这两个模型上进行微调，让参数向这7个语种靠拢。 2.将词表进行裁剪，只保留中英日韩西法阿这7个语种，然后再重复1的工作。不知道是否可行？

robby927

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

新手遇到问题留言一下！出现“Model name 'model_train' not found in the model mapping”的解决方法

Colbert Interaction correct?

Embedding similarity scores stuck around 0.541 when using BGE-Code-V1 as retriever in RAG

BGE-Code-v1的训练数据有开源吗

if you fine tune Mistral or only use smart prompting ?

why is there no within training eval for m3 or any encoder?

这里的代码是不是有问题？deocder和cls模式都去的是序号0的embedding。

空字符串和其他字符串的相似度都有0.5以上？

The results obtained by get text embedding batch and get text embedding are different

bge-m3和bge-rerank-v2-m3按语种裁剪

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard