InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Bug] InternVL3_5-30B-A3B HF格式权重使用VeRL训练CPU OOM

Open Joel0495 opened this issue 2 months ago • 0 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

机器有2T内存,load_pretrain和初始化FSDP之后就已经占用了1.9T了,基本无法开始训练

Reproduction

Environment

VeRL

Error traceback


Joel0495 avatar Oct 10 '25 10:10 Joel0495