Jun Liu
Jun Liu
@muralinr and @DrizztDoUrden could you please take a look with me too?
Right, I tried on another gfx900 and cannot reproduce this issue either.
It seems that it only fails for gfx906
Tried on a gfx906 and still cannot reproduce this issue.
@DrizztDoUrden @shurale-nkn could you reproduce this issue?
@carlushuang @shaojiewang do you have vega to test if the issue is reproducible?
The problem is not reproducible on a gfx900 (with ROCm 5.0 base and ROCm 5.2 docker) Lower the urgency level. However, it is still a "high" issue since it impacts...
Now this issue is happening on gfx908 again: http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/issue_1576_bwdfp16gpuref/5/pipeline @JehandadKhan could we assign one host/API engineer on this issue?
After some discussion: @muralinr could you try running this test multiple times on a MI100 development node, and see if we can reproduce it? I would suggest some static code...
@atamazov yes we should not close this (automatically closed with the merged PR for WA). Actually I think the urgency of this one should be higher since now we are...