FlagEmbedding
FlagEmbedding copied to clipboard
Retrieval and Retrieval-augmented LLMs
运行examples/inference/embedder/encoder_only/auto_m3_multi_devices.py,单个GUP可以运行,device指定两个GUP就报错NoneType Traceback (most recent call last): File "/tmp/pycharm_project_65/FlagEmbedding-master/du_test/test_start.py", line 45, in test_m3_multi_devices() File "/tmp/pycharm_project_65/FlagEmbedding-master/du_test/test_start.py", line 34, in test_m3_multi_devices dense_scores = queries_embeddings["dense_vecs"] @ passages_embeddings["dense_vecs"].T TypeError: 'NoneType' object is not subscriptable 求解
 这个是我的训练脚本,同一分数据,我微调zh-v1.5-bge-small,在验证集效果更好,为什么qwen0.5B参数量不是更大吗,求指点哪里出了问题
你好,请问对于使用CLIP检索出来n张图片,是否有reranker的模型用于图文检索的的重排,即输入图片和文本,经过单塔模型输出相似度?
Thanks for sharing this great work! I was curious about how the bge-en-icl model is evaluated. I tried following the examples on [this page](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/evaluation#1-mteb), but it didn't work out. It...
# 加载Reranker模型 loaded_reranker_model = LayerWiseFlagLLMReranker( 'path_to_original_bge-reranker-v2-minicpm-layerwise', model_class='decoder-only-layerwise', query_max_length=256, passage_max_length=1024, use_fp16=True, devices=['cuda:1'] ) 推理: scores = compute_score(pairs, cutoff_layers=[28],normalize=True) 添加教师分数: scores = compute_score(pairs, cutoff_layers=[28]) 这样得到的教师分数可以用于finetune吗?cutoff_layers、normalize是需要的吗?
使用FlagEmbedding部署bge-reranker-v2.5-gemma2-lightweight进行推理时,两个pair输出分数分别是13和27,normalize=True时这两个分数都归一化到0.99了 感觉不太对
因为需要api,Online Serving,就看下vllm,发现支持XLMRobertaModel,用它加载bge-m3模型 vllm代码 ``` from vllm import LLM prompts = ['精通excel', '银行项目', '市场营销'] model = LLM( model="/bge/bge-m3", task="embed", enforce_eager=True, ) outputs = model.embed(prompts) for prompt, output in zip(prompts, outputs): embeds...
from FlagEmbedding import FlagLLMReranker reranker = FlagLLMReranker('/bge-reranker-v2-gemma/bge-reranker-v2-gemma', use_fp16=True) def reranker_embedding(sentences_1,sentences_2): score = reranker.compute_score([sentences_1,sentences_2],normalize=True) return score[0] import os os.environ["TOKENIZERS_PARALLELISM"] = "false" os.environ["DISABLE_TQDM"] = "true" reranker_embedding('JOYTIME (HK) INTERNATIONAL COMPANY','Voit') 一直显示下面的进度条,如何禁用呢? 100%|██████████| 1/1...
开发者,你们好,有个微调样本构造的问题请教: **问题背景**:针对给出的query,有一系列的doc可以用于排序。这些doc间也有优劣,(即使是正样本,也是有优劣之分)。那么可以将某些样本既作为正样本,又作为负样本吗?例如,有如下数据:  可以构造出 {"query":"刘亦菲","pos":["刘亦菲百度百科"],"neg":["刘亦菲已出道10年","刘一飞的微博"]} {"query":"刘亦菲","pos":["刘亦菲已出道10年"],"neg":["刘一飞的微博","华语歌手"]} 这样的训练数据吗?如果可以,**会因为『刘亦菲已出道10年』既是正样本,也是负样本会给模型造成混乱吗?**
``` File "/conda/envs/llm/lib/python3.12/site-packages/FlagEmbedding/inference/embedder/encoder_only/m3.py", line 295, in encode return super().encode( ^^^^^^^^^^^^^^^ File "/conda/envs/llm/lib/python3.12/site-packages/FlagEmbedding/abc/inference/AbsEmbedder.py", line 275, in encode embeddings = self.encode_multi_process( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/conda/envs/llm/lib/python3.12/site-packages/FlagEmbedding/abc/inference/AbsEmbedder.py", line 416, in encode_multi_process [output_queue.get() for _ in...