FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

Retrieval and Retrieval-augmented LLMs

Results 622 FlagEmbedding issues
Sort by recently updated
recently updated
newest added

作者,你好,非常感谢你们能够开源这么棒的模型,目前我在自己的服务器上对BGE-M3进行复现,目前仅复现了微调阶段,使用的基座模型是bge-m3-unsupervised,执行命令是: ```python torchrun --nproc_per_node 8 {WORK_DIR}/run.py \ --output_dir /cache/output \ --model_name_or_path {WORK_DIR}/preModel/bge-m3-unsupervised \ --train_data /cache \ --learning_rate 1e-5 \ --fp16 \ --cache_path {WORK_DIR}/cache_path/ \ --gradient_checkpointing True \ --gradient_accumulation_steps=54 \ --dataloader_pin_memory=True...

# 模型 jina-embeddings-v2-base-zh # 模型微调参考 FlagEmbedding/examples/finetune/embedder/encoder_only/ 路径下的脚本修改 # 微调命令: export WANDB_MODE=disabled train_data="/xxx/xxx/data/finetune_data_score_v2.jsonl" num_train_epochs=4 per_device_train_batch_size=256 num_gpus=2 if [ -z "$HF_HUB_CACHE" ]; then export HF_HUB_CACHE="$HOME/.cache/huggingface/hub" fi model_args="\ --model_name_or_path /SharedNFS/LLM_model/jina-embeddings-v2-base-zh \ --cache_dir $HF_HUB_CACHE...

from FlagEmbedding import BGEM3FlagModel 最近报错 AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType'

I have other packages depending on higher versions of transformer such as sentence-transformers, marker-pdf and so on. Can the future releases upgrade transformers version requirements? ``` flagembedding 1.3.2 has requirement...

比如local_files_only必须传进去,要不断网后没法使用。直接把所有kwargs传进去是没问题的,因为 transformers/models/auto/auto_factory.py from_pretrained 函数对参数进行了过滤。

How could I finetune dense and sparse embedding only ? I try to use this script : ```py %%bash torchrun --nproc_per_node 1 \ -m FlagEmbedding.finetune.embedder.encoder_only.m3 \ --model_name_or_path /home/alex/ejada/developers/martina/my_cache/models--BAAI--bge-m3 \ --cache_dir...

想用一个reranker模型,同时对相似查询和文档这两类数据进行打分,会不会出现给的相似查询的分数比文档的分数高的情况啊?(也可能自然的文档的分数会比相似查询的分数高)这种问题要怎么解决呢?

该问题最初来源于:使用不同库(sentence_transformers等)加载bge-rerank-large,进行重排,发现结果都不相同(指重排后的顺序),其中一个原因是由于sentence_transformers会对输入去除空白符 使用以下脚本模拟一下sentence_transformers去除空白符的操作: ``` import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "/data/reranker/bge-reranker-large" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) def rerank(query, documents):...

Testing the bge-m3 embedding model, I wanted to see how it behaves under varying scenarios. After generating sparse embeddings and storing them in some json, I wanted to calculate their...