FlagEmbedding Finetune BGE-M3

How could I finetune dense and sparse embedding only ? I try to use this script :

%%bash
torchrun --nproc_per_node 1 \
	-m FlagEmbedding.finetune.embedder.encoder_only.m3 \
	--model_name_or_path /home/alex/ejada/developers/martina/my_cache/models--BAAI--bge-m3 \
    --cache_dir ./cache/model \
    --train_data ./ft_data/training.json \
    --train_group_size 4 \
    --query_max_len 256 \
    --passage_max_len 256 \
    --pad_to_multiple_of 4 \
    --query_instruction_for_retrieval 'Represent this sentence for searching relevant passages: ' \
    --query_instruction_format '{}{}' \
    --knowledge_distillation False \
	--output_dir ./test_encoder \
    --learning_rate 1e-5 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --dataloader_drop_last True \
    --warmup_ratio 0.1 \
    --logging_steps 1 \
    --save_steps 1000 \
    --negatives_cross_device \
    --temperature 0.02 \
    --sentence_pooling_method cls \
    --normalize_embeddings True \
    --kd_loss_type m3_kd_loss \
    --unified_finetuning True \
    --use_self_distill True \
    --fix_encoder True \
    --colbert_dim 0 \
    --self_distill_start_step 0

Jan 20 '25 14:01 tenafrangelos

ColBERT vector and sparse embedding are finetuned together. If you want to remove the colbert vector, you need to remove the code in the finetune module.

Jan 23 '25 06:01 545999961

Thanks for your answer . It's work. I update loss function to the following : Before :

return dense_scores + 0.3 * sparse_scores + colbert_scores
loss = (loss + ensemble_loss + 0.1 * sparse_loss + colbert_loss) / 4
loss += (dense_self_distill_loss + 0.1 * sparse_self_distill_loss + colbert_self_distill_loss) / 3

After :

return dense_scores + 0.3 * sparse_scores
loss = (loss + ensemble_loss + 0.1 * sparse_loss) / 3
loss += (dense_self_distill_loss + 0.1 * sparse_self_distill_loss) / 2

Is that valid or there is better equation ? should I reduce sparse_score ?

Jan 28 '25 11:01 tenafrangelos