FlagEmbedding
FlagEmbedding copied to clipboard
Retrieval and Retrieval-augmented LLMs
Hello, I downloaded the weights from the Hugging Face mirror, but when loading the model using FlagModel or SentenceTransformer methods, the following issues occur. How can I solve this. Traceback...
有没有类似bge 1.5的那种预训练脚本?
请问如果我的query是“XXX的损失率”, pos为“损失率”, 这样的微调效果如何呢,以及可以选择“损失量”这种词作为neg吗,还是选择完全不相关的词作为neg,谢谢您的解答
when uses code like this: ``` from FlagEmbedding import LayerWiseFlagLLMReranker reranker = LayerWiseFlagLLMReranker('/path/bge-reranker-v2-minicpm-layerwise', use_fp16=True) score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing...
您好,在复现bge的效果过程中,我发现t2retrieval的效果一直上不去,我怀疑跟自己对t2ranking数据集的处理方式有关。我在使用t2ranking数据集的时候,是使用label为2/3的作为pos,label为0/1的为neg,并且是否使用了官方挖掘的负样例。想问下你们是如何准备t2ranking的数据用于微调训练的啊
首先非常感谢贵团队杰出的开源工作,真的为我们提供了非常多的便利。 我在用llm_instruction_reranker的时候遇到一个问题,希望得到解惑 问题在于下面这个方法,他来自:https://github.com/FlagOpen/FlagEmbedding/blob/13da7435aba2c4cfbbd7caa4c595fe4862f6ba19/FlagEmbedding/llm_reranker/finetune_for_instruction/trainer.py#L9C2-L29C1 这里如果是lora的话会调用modeling内修改的save方法,这样出来的模型是符合预期的,但是如果全量的话,会使用默认的save方法,这时模型的key就变成了model.xxx(应该为xxx),这样就不能使用AutoModelForCausalLM加载了,请问这里是有什么原因吗?不应该都是用modeling内的save方法吗,期待解惑~ ``` def _save(self, output_dir: Optional[str] = None, state_dict=None): if not self.use_lora: super()._save(output_dir, state_dict) return output_dir = output_dir if output_dir is not None else self.args.output_dir os.makedirs(output_dir, exist_ok=True)...
Can anyone help me, thanks?
I am fine-tuning the m3-base or m3-base-unsupervised. I have a question about the fine-tuning result. I'm fine-tuning using the format of Toy Data in Unified Fine-tuning. I'm using about 200,000+...
运行命令 python -m FlagEmbedding.baai_general_embedding.finetune.hn_mine \ --model_name_or_path '/Volumes/移动硬盘/ptrain/output/encoder_model' \ --input_file toy_finetune_data.jsonl \ --output_file toy_finetune_data_minedHN.jsonl \ --range_for_sampling 1-200 \ --negative_number 15 非GPU运行时,一直卡顿在 _torch_pytree._register_pytree_node( inferencing embedding for corpus (number=15)-------------- inferencing embedding for queries...
训练参数和警告问题
您好, 1.我在训练reranker-m3的时候发现模型是按照每500个step自动保存的,我想修改保存的步数或者增加判断条件进行保存应该在哪里修改呢? 2.在训练的时候reranker-m3和bge-reranker-large都会出现一个报错 `Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always...