Jicheng Li

Results 4 comments of Jicheng Li

> 去掉--fp16, 设置 --torch_dtype auto 感谢回复,操作后依然报错

> > I'm using 8×8 A100 GPUs to fine-tune (SFT) the Qwen3-VL-235B-A22B-Instruct model, but I keep encountering out-of-memory (OOM) issues regardless of the settings. Could you please advise me on...

> > > > I'm using 8×8 A100 GPUs to fine-tune (SFT) the Qwen3-VL-235B-A22B-Instruct model, but I keep encountering out-of-memory (OOM) issues regardless of the settings. Could you please advise...

> try use tp4ep8pp8 or tp4ep8pp4 Thanks for your reply. Based on your configuration, the total GPU requirement should be tp × ep × pp = 256 or 128 GPUs,...