InternVL
InternVL copied to clipboard
[Bug] InternVL3_5-30B-A3B HF格式权重使用VeRL训练CPU OOM
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
机器有2T内存,load_pretrain和初始化FSDP之后就已经占用了1.9T了,基本无法开始训练
Reproduction
Environment
VeRL
Error traceback