Charles Yang
Charles Yang
@pfxuan this is another dataset, truck, training 15000 iterations, the final loss >0.4, seems not converged. Radeon 7900 XTX + ROCm 6.3.3 + Ubuntu22.04 + torch 2.1.2. [truck15k.zip](https://github.com/user-attachments/files/19718681/truck15k.zip) [15ktruck.txt](https://github.com/user-attachments/files/19718683/15ktruck.txt) 
same question, qwen2 is relative out of time, is qwen3 verified? @Vincentwei1021 @wangxiongts @linhaojia13 @lxysl @BradyFU
@jcaesar @eokeeffe @pierotofy @pfxuan could you give some advice?
@pfxuan @eokeeffe @pierotofy I made system experiments. The issue is caused by Ubuntu24.04. Same hardware, same version of ROCm and PyTorch, it works on Ubuntu 22.04 but failed on Ubuntu...
@pfxuan @eokeeffe @pierotofy Even I running on a Ubuntu24.04 host and Ubuntu 22.04 container, it won't work.
So apologize for any disturbing. I just want to raise attention for this issue which show stopper on AMD hardware. take it easy and thanks for reminding.
mark,同求amd方案
同求 AMD方案
同求AMD Radeon方案