Jicheng Li
Jicheng Li
> 去掉--fp16, 设置 --torch_dtype auto 感谢回复,操作后依然报错
> > I'm using 8×8 A100 GPUs to fine-tune (SFT) the Qwen3-VL-235B-A22B-Instruct model, but I keep encountering out-of-memory (OOM) issues regardless of the settings. Could you please advise me on...
> > > > I'm using 8×8 A100 GPUs to fine-tune (SFT) the Qwen3-VL-235B-A22B-Instruct model, but I keep encountering out-of-memory (OOM) issues regardless of the settings. Could you please advise...
> try use tp4ep8pp8 or tp4ep8pp4 Thanks for your reply. Based on your configuration, the total GPU requirement should be tp × ep × pp = 256 or 128 GPUs,...