FlagEmbedding issues

否可直接使用bge-m3的dense和sparse用于检索任务

4

您好，请教两个问题： 1.是否可以不微调而直接对文档库使用bge-m3进行embeding,将dense和sparse导入milvus用于检索任务(没有正例和负例样本) 2. 我发现获取的sparse向量的维度是跟句子分词后的长度是对应的（我理解sparse向量类似于词袋模型，维度应该是vocab的维度），这就是说不同的句子sparse的维度是不同的?

you567

ValueError: Attempting to unscale FP16 gradients.

4

Here is the Google Colab link I used for fine-tuning : [https://colab.research.google.com/drive/1kiALBR1UarPobiftZmiHfwFyk7hTCDnV?usp=sharing](url) When I fine-tune the LLM-embed for tool retrieval using the command on Google Colab: ![image](https://github.com/FlagOpen/FlagEmbedding/assets/80111554/c442eac2-62d1-4651-848e-1f5b86bfadaa) An error occurred:...

QuangTQV

how to adjust hyperparameter for finetune llm embed

2

llm embed has the following training script. I don't know how to adjust hyperparameters like train_batch_size, learning rate, warmup_ratio, ... torchrun --nproc_per_node=8 run_dense.py \ --output_dir data/outputs/tool \ --train_data llm-embedder:tool/toolbench/train.json \...

QuangTQV

请问BGE-M3中的multi-Granularity中的最大文档长度8192tokens是怎么实现的

4

chengzi-big

BAAI/bge-reranker-v2-m3 模型中是如何計算輸入的 max_length ?

4

```python reranker = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True) scores = reranker.compute_score(['要查詢的問題', "查詢的文檔...."], normalize=True, max_length=512) ``` 關於這個 max_length = 512 具體是什麽單位？是 token 還是字符長度? 超過後又是如何處理？直接截斷嗎? reranker-v2-m3 模型本身的 max_length 有上限嗎？這個 512 是可調整(例如拉高到1024 或...

thebarkingdog-yh

loss 下降到2.7就不下降了

1

你好，我在做一个QA的数据集的任务，其中query是question，pos是answer，neg是其他的选项，但是在finetune 的时候，loss下降到2.7就不再下降了，并且最后accuracy也大概只有20-30%，请问大佬知道这种事什么问题么？脚本：torchrun --nproc_per_node 1 -m FlagEmbedding.baai_general_embedding.finetune.run --output_dir finetune_model --model_name_or_path BAAI/bge-small-en-v1.5 --train_data fine_tune_data_10.jsonl --learning_rate 1e-5 --bf16 --num_train_epochs 5 --per_device_train_batch_size 16 --dataloader_drop_last True --normlized True --temperature 0.02 --query_max_len 82 --passage_max_len 56...

Alexender-Ye

finetuning failing

2

i m getting following error when follwoing the examples finetune steps. RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons:...

riyaj8888

hybrid_search对于sparse向量的长度有什么要求吗？

3

在milvus数据库中运行hybrid_search方法的时候，出现了以下报错：pymilvus.exceptions.MilvusException: 但我看milvus中使用hybrid_search的示例里面也没有对sparse的长度做什么要求呀，还是有可能因为嵌入出现问题，导致无法找到sparse向量的长度

Mycroft-s