FlagEmbedding issues

BGE Embedding是不是动态Embedding

1

请问一下，bge模型在微调时引入了句子的上下文语义信息，那么同一个词语在意思不同的句子中所表示的嵌入向量是不是就不一样了？比如说"光盘"在"我买了一张光盘"和"光盘行动"中所表示的词向量应该是不同的，只不过我们无法确切得知具体的词向量

AttributeError: type object 'Dataset' has no attribute 'from_list'

1

Hello, The line **283** of the file `eval_icl` when evaluating for in-context learning, which is as follows: `dataset = datasets.Dataset.from_list(flat_data)` is causing the following error: `AttributeError: type object 'Dataset' has...

israaexol

BGE-M3 MCLS implementation

The BGE-M3 paper mentioned the MCLS (Multiple CLS) strategy to enhance the model’s long-text capabilities without the need for training. Does this repo contain the implementation for this strategy?

nntoan209

embedding微调问题

4

1. FT后loss值一直降不下去，参数如下，本地cpu跑的，5轮训练后差不多这样，这是什么原因呢或者有什么优化的地方 {"epoch": 4.18,"learning_rate": 1.6492693110647182e-06,"loss": 0.2706,"step": 2000} torchrun --nproc_per_node 1 -m FlagEmbedding.baai_general_embedding.finetune.run --output_dir ./src/aiChatServer/fintune/model --model_name_or_path ./src/aiChatServer/fintune/bge-m3 --train_data ./src/aiChatServer/fintune/fintune_res.jsonl --learning_rate 1e-5 --num_train_epochs 5 --per_device_train_batch_size 20 --dataloader_drop_last False --normlized True --temperature 0.02...

ListenZen

llm-based的reranker的一些问题请教

10

您好，请教一下，基于llm-based的reranker，贵团队发布了两个版本： 1.[bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) 2.[BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) 计算query和doc的相关性，第一种使用的是输出yes的概率值。第二种使用的是最后一个词的向量通过一个mlp映射到1的分值，可以使用layerwise，如果不layerwise的话，可以使用一些现成的库来做比如：MiniCPMForSequenceClassification。请问下： 1.在相同的数据下，有比较过两种方法的差异吗？ 2.在相同的数据下，同一种方法，过加不加prompt描述的差异吗？

NLPJCL

Somethings about llama-index evaluation

Could you tell me which dataset in your llama-index evaluation or how to do this, thanks

aagq

bge-reranker-v2-minicpm-layerwise微调loss为1的问题

12

CUDA_VISIBLE_DEVICES=6,7 torchrun --nproc_per_node 2 \ -m FlagEmbedding.llm_reranker.finetune_for_layerwise.run \ --output_dir ./results/reranker/bge-reranker-v2-minicpm-layerwise \ --model_name_or_path /media/ai/HDD/Teamwork/LLM_Embedding_model/Embedding/Embedding/bge-reranker-v2-minicpm-layerwise \ --train_data /media/ai/HDD/Teamwork/wangenzhi/FlagEmbedding-master/official/FlagEmbedding/fine_data/layer_reranker.jsonl \ --learning_rate 6e-5 \ --fp16 \ --num_train_epochs 1 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 4...

sevenandseven

Impavidity

FlagEmbedding
FlagEmbedding copied to clipboard

Metadata

BGE Embedding是不是动态Embedding

AttributeError: type object 'Dataset' has no attribute 'from_list'

BGE-M3 MCLS implementation

embedding微调问题

llm-based的reranker的一些问题请教

Somethings about llama-index evaluation

bge-reranker-v2-minicpm-layerwise微调loss为1的问题

损失函数绘图

增量预训练

Training Time of Reranker

← Metadata

Owner

Metadata

FlagEmbedding FlagEmbedding copied to clipboard

Metadata

← Metadata

Owner

Metadata

FlagEmbedding
FlagEmbedding copied to clipboard