FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

Retrieval and Retrieval-augmented LLMs

Results 622 FlagEmbedding issues
Sort by recently updated
recently updated
newest added

How to improve concurrency? bge and rerank

您好,请问当label的长度不为1时recall@1理论上不可以到1吗,就是对于多个正例的样本是否recall@1最大也无法到1呢

```python import argparse from pathlib import Path from C_MTEB.tasks import * from flag_dres_model import FlagDRESModel from mteb import MTEB import openai import tiktoken import pickle import os API_KEY = "********************"...

Q1: What does the description below from [README](https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/reranker/README.md) mean specifically? > train_group_size: the number of positive and negatives for a query in training. There are always one positive, so this...

使用官方示例 transformers做serving,显存40G,循环调用,A100 40G显存直接打满,如果不重启,这部分显存不释放 ``` from FlagEmbedding import FlagModel sentences_1 = ["样例数据-1", "样例数据-2"] sentences_2 = ["样例数据-3", "样例数据-4"] model = FlagModel('BAAI/bge-large-zh-v1.5', query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:", use_fp16=True) # Setting use_fp16 to True speeds up computation with...

描述下我遇到的问题: 在做检索的时候,我设置了一个阈值,但是在这个阈值下,有时候召回的文本不是特别相关,我想通过训练去降低这部分文本与query之间的相似度,但是库中又没有比较好的正例 请问,训练时是否可能把pos设置为空数组呢? 或者,我是不是也可以用一些重写等方式,根据原始query造一些正例出来,然后构造训练集,这样能够达到我的目的吗?

我的训练启动命令: ```bash torchrun --nnodes $NNODES --nproc_per_node $NPROC_PER_NODE \ --node_rank $RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT \ -m FlagEmbedding.baai_general_embedding.finetune.run \ --output_dir path/to/output \ --model_name_or_path bge-large-zh-v1.5 \ --train_data path/to/data.jsonl \ --learning_rate $LEARNINGRATE \...