FlagEmbedding
FlagEmbedding copied to clipboard
Retrieval and Retrieval-augmented LLMs
I tried hard neg mining, but when running on 2 different gpu's (namely t4 and a100), t4 only took a few seconds while a100 took 20 minutes, and their hard...
预训练问题
如果想基于RetroMAE预训练bart、t5系列的模型,应该如何解决呢? [bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log](https://github.com/FlagOpen/FlagEmbedding/files/15302049/bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log)
微调出错

目前只看到 `['decoder', 'encoder', 'encoder-decoder']` 三种。问一下是否有计划把 crossencoder 也加进去?
"Hello, I encountered a Faiss library error when running the eval_msmarco.py script. How do I resolve this?" this is error: warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported...
在模型调用的时候,会出现没有encode方法的报错。使用的是QWen0.5B模型: `Error while evaluating CmedqaRetrieval: 'Qwen2Model' object has no attribute 'encode' Traceback (most recent call last): File "/data2/mteb/zhangchi/mteb_eval/run_mteb_chinese.py", line 266, in evaluation.run(model, output_folder=f"qwen/qwen_MNR_supervised_v1") File "/home/zhouzhou/anaconda3/envs/llm2vec/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 271, in run raise...
实现断点恢复的逻辑和自动选择可用端口,端口不可用的报错还是很常见的,对于公司内部使用来说不是很方便,因为并不是在本地运行,每次都要提交一个任务,报错的话又要重新提交一个任务
数据格式问题
你好,在评估msmarco指标时,是将content数据变为: {"content": "A is ...", "B is ...", "C is ..."} 这种格式是吗? 每一个content后有多个候选的段落。
其中有几个问题,首先使用的是bge-m3和bge-reranker-v2以及layerwise以及gemma-2b这几个都试着微调了,然后全部按要求进行微调以及评测数据集格式正确,各种都检查过,遇到了几个问题 1. 在使用step0-rerank-result (https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/MKQA/README.md 中multivector and all rerank)时,有显示: Some weights of XLMRobertaModel were not initialized from the model checkpoint at /embedding/m3reranklong and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight'] 不知道这是因为什么原因 2. 在微调完embedding时,在评测dense...
When I try to load on multiple GPUS, I get the following error: tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-base') model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-base', device_map='auto') error: `RuntimeError: Expected all tensors to be on the same...