FlagEmbedding issues

bge-reranker-base在SciDocRR上MAP很低，正常吗？

1

运行 import mteb from sentence_transformers import SentenceTransformer model_name = "BAAI/bge-reranker-base" model = SentenceTransformer(model_name) tasks = mteb.get_tasks(tasks=["SciDocsRR"]) evaluation = mteb.MTEB(tasks=tasks) results = evaluation.run(model, output_folder=f"results/{model_name}") 得到结果如下： { "dataset_revision": "d3c5e1fc0b855ab6097bf1cda04dd73947d7caab", "evaluation_time": 146.51545810699463, "kg_co2_emissions":...

jfzhang726

Visualized BGE Evaluation

12

Hi, i want to reproduce the result of Visualized BGE, but zero-shot benchmark not clear, such as WebQA. Can you provide evaluation dataset and codes for zero-shot benchmark. Thanks!

zwhus

Normalize Output Scores of bge-reranker to Range 0-1

1

In my observations, the score fluctuates significantly, which makes it challenging to interpret the results on a standard scale. I would like to normalize these output scores to a range...

CarllllWang

Visualized_BGE Train

1

您好，我想问一下在训练过程呢各种难过stage2阶段，难样本的选择是怎么样的呢，论文中我看说是it2i是3，那么是随机选机选择3吗，t2it需要使用难样本吗, 另外我想问一下stage训练的时候是一个batch是it2i或者t2it吗，是否有其他额外的操作呢

zwhus

Adding validation set evaluation for reranker finetuning

1

To avoid overfitting, I would like to add validation (eval) dataset to the finetuning. I see the BiTrainer is a subclass of Huggingface's [Trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer) class which accepts a eval_dataset argument....

jackkwok

Training with unsloth

4

Currently, Unsloth can only support single GPU training, how can you implement it with 8-GPU training? Thx

ZetangForward

AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node'

1

``` root@autodl-container-47bb4a8b0c-047da1e4:~# python -m FlagEmbedding.baai_general_embedding.finetune.hn_mine \ > --model_name_or_path /root/autodl-tmp/models/bge-base-en-v1.5 \ > --input_file /root/bge_ft_train_wo_titles_wo_tables-2024-07-12.jsonl \ > --output_file /root/bge_ft_train_wo_titles_wo_tables-2024-07-12_mineHN.jsonl \ > --range_for_sampling 2-50 \ > --negative_number 3 \ > --use_gpu_for_searching Traceback (most...

zhaiwenjia

How to implement LongLLM in NPU device

1

When I create a LongLLM training environment on an NPU device, the installation of the flash-attention dependency is not possible. Are LongLLM training scripts allowed to be utilized on NPU?

yangjq713

bgeM3 batch_size参数是什么意思？

3

这个参数是配置什么的呢？

gggdroa

关于BGE-M3在微调时报：pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays

1

**场景**：使用BGE-M3进行finetune，数据文件.jsonl 含有158000行记录，每行记录一个query，pos列表的长度为1，neg列表的长度为15。 **异常报错**： WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal...

MarcusEddie

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

bge-reranker-base在SciDocRR上MAP很低，正常吗？

Visualized BGE Evaluation

Normalize Output Scores of bge-reranker to Range 0-1

Visualized_BGE Train

Adding validation set evaluation for reranker finetuning

Training with unsloth

AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node'

How to implement LongLLM in NPU device

bgeM3 batch_size参数是什么意思？

关于BGE-M3在微调时报：pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard