FlagEmbedding issues

bge-small-zh加载错误

1

Hello, I downloaded the weights from the Hugging Face mirror, but when loading the model using FlagModel or SentenceTransformer methods, the following issues occur. How can I solve this. Traceback...

sevenandseven

一些微调数据集的问题

2

请问如果我的query是“XXX的损失率”， pos为“损失率”，这样的微调效果如何呢，以及可以选择“损失量”这种词作为neg吗，还是选择完全不相关的词作为neg，谢谢您的解答

yanzhang404

can't load reranker-v2-minicpm-layerwise model

7

when uses code like this： ``` from FlagEmbedding import LayerWiseFlagLLMReranker reranker = LayerWiseFlagLLMReranker('/path/bge-reranker-v2-minicpm-layerwise', use_fp16=True) score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing...

AlexYoung757

t2ranking数据集的使用

2

您好，在复现bge的效果过程中，我发现t2retrieval的效果一直上不去，我怀疑跟自己对t2ranking数据集的处理方式有关。我在使用t2ranking数据集的时候，是使用label为2/3的作为pos,label为0/1的为neg,并且是否使用了官方挖掘的负样例。想问下你们是如何准备t2ranking的数据用于微调训练的啊

zhaobinNF

关于llm_instruction_reranker的save方法的问题

2

首先非常感谢贵团队杰出的开源工作，真的为我们提供了非常多的便利。我在用llm_instruction_reranker的时候遇到一个问题，希望得到解惑问题在于下面这个方法，他来自：https://github.com/FlagOpen/FlagEmbedding/blob/13da7435aba2c4cfbbd7caa4c595fe4862f6ba19/FlagEmbedding/llm_reranker/finetune_for_instruction/trainer.py#L9C2-L29C1 这里如果是lora的话会调用modeling内修改的save方法，这样出来的模型是符合预期的，但是如果全量的话，会使用默认的save方法，这时模型的key就变成了model.xxx（应该为xxx），这样就不能使用AutoModelForCausalLM加载了，请问这里是有什么原因吗？不应该都是用modeling内的save方法吗，期待解惑～ ``` def _save(self, output_dir: Optional[str] = None, state_dict=None): if not self.use_lora: super()._save(output_dir, state_dict) return output_dir = output_dir if output_dir is not None else self.args.output_dir os.makedirs(output_dir, exist_ok=True)...

passionate11

NameError: name 'index_first_axis' is not defined

14

Can anyone help me, thanks?

Stangerine

m3 embedding model pos score(similarity) is getting lower

6

I am fine-tuning the m3-base or m3-base-unsupervised. I have a question about the fine-tuning result. I'm fine-tuning using the format of Toy Data in Unified Fine-tuning. I'm using about 200,000+...

jhyeom1545

Hard negatives运行时一直卡顿，没有正常输出

2

运行命令 python -m FlagEmbedding.baai_general_embedding.finetune.hn_mine \ --model_name_or_path '/Volumes/移动硬盘/ptrain/output/encoder_model' \ --input_file toy_finetune_data.jsonl \ --output_file toy_finetune_data_minedHN.jsonl \ --range_for_sampling 1-200 \ --negative_number 15 非GPU运行时，一直卡顿在 _torch_pytree._register_pytree_node( inferencing embedding for corpus (number=15)-------------- inferencing embedding for queries...

karong398

训练参数和警告问题

15

您好， 1.我在训练reranker-m3的时候发现模型是按照每500个step自动保存的，我想修改保存的步数或者增加判断条件进行保存应该在哪里修改呢？ 2.在训练的时候reranker-m3和bge-reranker-large都会出现一个报错 `Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always...

Anthony-Sun-S

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

bge-small-zh加载错误

bge m3如何进行预训练

一些微调数据集的问题

can't load reranker-v2-minicpm-layerwise model

t2ranking数据集的使用

关于llm_instruction_reranker的save方法的问题

NameError: name 'index_first_axis' is not defined

m3 embedding model pos score(similarity) is getting lower

Hard negatives运行时一直卡顿，没有正常输出

训练参数和警告问题

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard