LLaMA-Factory
LLaMA-Factory copied to clipboard
相同train_batchsize,不同的per device batch size模型train loss收敛情况截然不同
Reminder
- [X] I have read the README and searched the existing issues.
System Info
尝试Yi和Qwen2-1.5b模型都存在这个问题
train_batchsize=64
per device batch size=1 / 2 / 4 (设置不同gradient accumulate step保证train_batchsize=64)
在相同的数据集上train loss收敛情况如下:
Reproduction
任何sft数据集可复现
Expected behavior
No response
Others
No response