LLaMA-Factory 相同train_batchsize，不同的per device batch size模型train loss收敛情况截然不同

相同train_batchsize，不同的per device batch size模型train loss收敛情况截然不同

Open PROoshio opened this issue 6 months ago • 0 comments

Reminder

[X] I have read the README and searched the existing issues.

System Info

尝试Yi和Qwen2-1.5b模型都存在这个问题 train_batchsize=64 per device batch size=1 / 2 / 4 (设置不同gradient accumulate step保证train_batchsize=64) 在相同的数据集上train loss收敛情况如下：

Reproduction

任何sft数据集可复现

Expected behavior

No response

Others

No response

Aug 07 '24 06:08 PROoshio

LLaMA-Factory LLaMA-Factory copied to clipboard

相同train_batchsize，不同的per device batch size模型train loss收敛情况截然不同

Reminder

System Info

Reproduction

Expected behavior

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard