LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

相同train_batchsize,不同的per device batch size模型train loss收敛情况截然不同

Open PROoshio opened this issue 6 months ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

System Info

尝试Yi和Qwen2-1.5b模型都存在这个问题 train_batchsize=64 per device batch size=1 / 2 / 4 (设置不同gradient accumulate step保证train_batchsize=64) 在相同的数据集上train loss收敛情况如下: image

Reproduction

任何sft数据集可复现

Expected behavior

No response

Others

No response

PROoshio avatar Aug 07 '24 06:08 PROoshio