FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

Finetune BGE-M3

Open tenafrangelos opened this issue 11 months ago • 2 comments

How could I finetune dense and sparse embedding only ? I try to use this script :

%%bash
torchrun --nproc_per_node 1 \
	-m FlagEmbedding.finetune.embedder.encoder_only.m3 \
	--model_name_or_path /home/alex/ejada/developers/martina/my_cache/models--BAAI--bge-m3 \
    --cache_dir ./cache/model \
    --train_data ./ft_data/training.json \
    --train_group_size 4 \
    --query_max_len 256 \
    --passage_max_len 256 \
    --pad_to_multiple_of 4 \
    --query_instruction_for_retrieval 'Represent this sentence for searching relevant passages: ' \
    --query_instruction_format '{}{}' \
    --knowledge_distillation False \
	--output_dir ./test_encoder \
    --learning_rate 1e-5 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --dataloader_drop_last True \
    --warmup_ratio 0.1 \
    --logging_steps 1 \
    --save_steps 1000 \
    --negatives_cross_device \
    --temperature 0.02 \
    --sentence_pooling_method cls \
    --normalize_embeddings True \
    --kd_loss_type m3_kd_loss \
    --unified_finetuning True \
    --use_self_distill True \
    --fix_encoder True \
    --colbert_dim 0 \
    --self_distill_start_step 0

tenafrangelos avatar Jan 20 '25 14:01 tenafrangelos

ColBERT vector and sparse embedding are finetuned together. If you want to remove the colbert vector, you need to remove the code in the finetune module.

545999961 avatar Jan 23 '25 06:01 545999961

Thanks for your answer . It's work. I update loss function to the following : Before :

  1. return dense_scores + 0.3 * sparse_scores + colbert_scores
  2. loss = (loss + ensemble_loss + 0.1 * sparse_loss + colbert_loss) / 4
  3. loss += (dense_self_distill_loss + 0.1 * sparse_self_distill_loss + colbert_self_distill_loss) / 3

After :

  1. return dense_scores + 0.3 * sparse_scores
  2. loss = (loss + ensemble_loss + 0.1 * sparse_loss) / 3
  3. loss += (dense_self_distill_loss + 0.1 * sparse_self_distill_loss) / 2

Is that valid or there is better equation ? should I reduce sparse_score ?

tenafrangelos avatar Jan 28 '25 11:01 tenafrangelos