FlagEmbedding issues

something is wrong with hard neg mine

2

I tried hard neg mining, but when running on 2 different gpu's (namely t4 and a100), t4 only took a few seconds while a100 took 20 minutes, and their hard...

QuangTQV

预训练问题

2

如果想基于RetroMAE预训练bart、t5系列的模型，应该如何解决呢？ [bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log](https://github.com/FlagOpen/FlagEmbedding/files/15302049/bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log)

LLLiHaotian

微调出错

2

![finetune1](https://github.com/FlagOpen/FlagEmbedding/assets/91358245/ac8eed0a-011b-4de9-9f11-ae6be9ebac9f)

chengzi-big

有把基于crossencoder 的reranker加入到LM_Cocktail的`mix_models_with_data` 计划吗？

1

目前只看到 `['decoder', 'encoder', 'encoder-decoder']` 三种。问一下是否有计划把 crossencoder 也加进去？

neofung

line 119, in index co = faiss.GpuMultipleClonerOptions() AttributeError: module 'faiss' has no attribute 'GpuMultipleClonerOptions'

1

"Hello, I encountered a Faiss library error when running the eval_msmarco.py script. How do I resolve this?" this is error: warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported...

sevenandseven

AttributeError: 'Qwen2Model' object has no attribute 'encode'

1

在模型调用的时候，会出现没有encode方法的报错。使用的是QWen0.5B模型： `Error while evaluating CmedqaRetrieval: 'Qwen2Model' object has no attribute 'encode' Traceback (most recent call last): File "/data2/mteb/zhangchi/mteb_eval/run_mteb_chinese.py", line 266, in evaluation.run(model, output_folder=f"qwen/qwen_MNR_supervised_v1") File "/home/zhouzhou/anaconda3/envs/llm2vec/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 271, in run raise...

ChiZhang-bit

实现断点恢复的逻辑和自动选择可用端口

2

实现断点恢复的逻辑和自动选择可用端口，端口不可用的报错还是很常见的，对于公司内部使用来说不是很方便，因为并不是在本地运行，每次都要提交一个任务，报错的话又要重新提交一个任务

LEON-gittech

数据格式问题

2

你好，在评估msmarco指标时，是将content数据变为： {"content": "A is ...", "B is ...", "C is ..."} 这种格式是吗？每一个content后有多个候选的段落。

sevenandseven

关于embedding微调后使用reranker的效果反而下降问题

12

其中有几个问题，首先使用的是bge-m3和bge-reranker-v2以及layerwise以及gemma-2b这几个都试着微调了，然后全部按要求进行微调以及评测数据集格式正确，各种都检查过，遇到了几个问题 1. 在使用step0-rerank-result (https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/MKQA/README.md 中multivector and all rerank)时，有显示： Some weights of XLMRobertaModel were not initialized from the model checkpoint at /embedding/m3reranklong and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight'] 不知道这是因为什么原因 2. 在微调完embedding时，在评测dense...

zeruiz99

Unable to load on multiple GPUs using HuggingFace Transformers

1

When I try to load on multiple GPUS, I get the following error: tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-base') model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-base', device_map='auto') error: `RuntimeError: Expected all tensors to be on the same...

mohammad-yousuf

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

something is wrong with hard neg mine

预训练问题

微调出错

有把基于crossencoder 的reranker加入到LM_Cocktail的`mix_models_with_data` 计划吗？

line 119, in index co = faiss.GpuMultipleClonerOptions() AttributeError: module 'faiss' has no attribute 'GpuMultipleClonerOptions'

AttributeError: 'Qwen2Model' object has no attribute 'encode'

实现断点恢复的逻辑和自动选择可用端口

数据格式问题

关于embedding微调后使用reranker的效果反而下降问题

Unable to load on multiple GPUs using HuggingFace Transformers

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard