FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

BGE-M3微调时grad_norm太大

Open 5663015 opened this issue 1 year ago • 3 comments

在微调BGE-M3时grad_norm总是比较大,在10~20左右,max_grad_norm也设成1了。数据大约1w条,私有的业务数据。部分参数如下:

do_train=True, do_eval=True, train_data=data_path + "/train", val_data=data_path + '/val', evaluation_strategy="steps", eval_steps=0.1, learning_rate=lr, bf16=True, weight_decay=0.01, warmup_ratio=0.05, lr_scheduler_type='cosine', max_grad_norm=1, logging_first_step=True, num_train_epochs=num_train_epochs, per_device_train_batch_size=per_device_train_batch_size, per_device_eval_batch_size=1, gradient_accumulation_steps=gradient_accumulation_steps, dataloader_drop_last=True, normlized=True, shuffle_ratio=0.02, temperature=0.02, query_max_len=64, passage_max_len=1024, train_group_size=8, negatives_cross_device=True, overwrite_output_dir=True, logging_steps=10, save_strategy='steps', save_total_limit=1, save_steps=0.5, query_instruction_for_retrieval="", same_task_within_batch=True, unified_finetuning=False,

学习率、epoch等都调过,都降不了grad_norm,并且会有过拟合

5663015 avatar Sep 06 '24 10:09 5663015

你好,请问你解决了吗,我是微调visualzied-bg3-m3模型是grad_norm超级大,在2000左右徘徊

CwbyWifsy avatar Jun 23 '25 07:06 CwbyWifsy

你好,请问你解决了吗,我是微调visualzied-bg3-m3模型是grad_norm超级大,在2000左右徘徊

没有,就这么训练了,效果也还可以

5663015 avatar Jun 23 '25 08:06 5663015

好的,谢谢

CwbyWifsy avatar Jun 23 '25 08:06 CwbyWifsy