FlagEmbedding issues

Can't download the data used for BGE embeddings

3

Hi - I'm trying to download the en and zh data from [this page](https://data.baai.ac.cn/details/BAAI-MTP). However, it keeps asking me to scan a WeChat login and then doesn't work. Is there...

avisil

Why are all the vectors I get when encoding a document using bge-multilingual-gemma2 consistent?

This is 3 texts： ![文本1](https://github.com/user-attachments/assets/db7304ac-9d91-48b1-8cf2-207fb6c9a7df) ![文本2](https://github.com/user-attachments/assets/3c60a5bc-10b8-42c2-87bd-517b6dca2c09) ![文本3](https://github.com/user-attachments/assets/19d00fed-70d2-4451-83be-3dd3aebe91b8) Loading with FlagModel Comparison of vector results of text 1 and 2： ![使用FlagModel加载](https://github.com/user-attachments/assets/4bb58824-01d0-4348-9e3f-c601251abda0) ![文本1-2向量对比](https://github.com/user-attachments/assets/03d026b3-0d3a-4d3b-af08-d04c3b6e5a3c) Loading using SentenceTransformer Comparison of vector results of text...

safwaqf

使用FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py

使用FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py 按照文件格式构建了自己的语料库和查询但是 {'MRR@1': 0.0, 'MRR@10': 0.0, 'MRR@100': 0.0, 'Recall@1': 0.0, 'Recall@10': 0.0, 'Recall@100': 0.0} 另外部分微调的数据为 {"query": "补骨脂对人体的哪些脏腑有作用，具体作用是什么？", "pos": ["入脾命门心包三经。为壮火益土之品。（补相火以通君火）"], "neg": ["主温中。心腹痛。呕吐。去口臭气。（别录）下气。止霍乱。一切冷气。消酒毒。吐酸。", "入肝肾二经。为冲和之品。（兼补剂能引肺金之气入肾）", "回春曰。河水与井水合用。亦名阴阳水。以上宣剂水部",...

yzx220

hard negative mining based on similarity score

In FlagEmbedding, hard negative mining is extracted based on ranking. (FlagEmbedding/baai_general_embedding/finetune/hn_mine.py) Is there a code that does hard negative mining based on similarity score?

daegonYu

Clarification on Prompt Usage and Special Tokens in LLARA-Passage Code

Dear Authors, Firstly, thank you for your insightful paper, "Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval." I found it highly informative and am excited about its potential...

jhy12

compute colbert score in batch

2

Is it possible to compute the colbert score for the m3 model for more than one pair at the time? currently seems very inneficient

davidbetancur8

How do I resume training from a saved checkpoint?

4

I have fine-tuned the reranker through this github. I would like to continue learning from the saved checkpoint. Can you give me some instructions on how to do that?

daegonYu

Sparse vector storage in BGE-M3 can be implemented using FAISS？

1

Recently, I completed a RAG system project, and I want to use the three retrieval methods in bge-m3. However, currently, when using BGEM3FlagModel () to load the model, errors will...

hahaha1121871443

grad_norm特别大，这样训练正常吗

1

![image](https://github.com/user-attachments/assets/db78465d-7f49-4ef6-b830-189b3f06283c) 参数以下： --learning_rate 3e-5 \ --fp16 \ --num_train_epochs 2 \ --per_device_train_batch_size 4 \ --dataloader_drop_last True \ --normlized False \ --temperature 0.02 \ --query_max_len 512 \ --passage_max_len 512 \ --train_group_size 6...

iamreallyi9

关于微调bge-reranker的max_len问题

1

我看训练参数中max_len设置为512，我现在希望可以微调bge_reranker_large模型，希望能达到2k的长度，我直接修改max_len为2k就行了嘛，但我发现训练完之后模型config中max_position_embeddings还是512+2，想问下如果想微调到2k的话还应该修改哪个参数呢

abc123456cxx

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

Can't download the data used for BGE embeddings

Why are all the vectors I get when encoding a document using bge-multilingual-gemma2 consistent?

使用FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py

hard negative mining based on similarity score

Clarification on Prompt Usage and Special Tokens in LLARA-Passage Code

compute colbert score in batch

How do I resume training from a saved checkpoint?

Sparse vector storage in BGE-M3 can be implemented using FAISS？

grad_norm特别大，这样训练正常吗

关于微调bge-reranker的max_len问题

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard