Post-merge issue of PR #644
The corresponding logic in #644 should be ok, but some threads may not do copy thread-wise copy. This logic may be better: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/644#discussion_r545192142
This logic need to be inspected: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/644#pullrequestreview-554817964
this logic due to ./bin/MIOpenDriver convfp16 -n 256 -c 1024 -H 14 -W 14 -k 256 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 failed, I will look into it
@junliume Unfortunately, I do not see how this was resolved. The code remains the same and this ticket has not been updated to prove that the current implementation is fine. I am afraid we are still in danger of elusive issues.
/cc @asleepzzz @asroy
@asleepzzz Has this been resolved as of the latest ROCm 6.1.1? Thanks
A quick look of the mentioned test passed:
root@b6e4a3c3408f:~/workspace/MIOpen/build# ./bin/MIOpenDriver convfp16 -n 256 -c 1024 -H 14 -W 14 -k 256 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 MIOpenDriver convfp16 -n 256 -c 1024 -H 14 -W 14 -k 256 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 PRNG seed: 12345678 MIOpen Backward Data Conv. Algorithm: 3, Solution: 37/ConvBinWinogradRxSf3x2 GPU Kernel Time Backward Data Conv. Elapsed: 1.127537 ms (average) stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs stats: bwdd-conv1x1u1, 256, 1024, 1, 1, 256, 14, 14, 26306674688, 134742016, 25690112, 23331, 142, 1.127537 Backward Convolution Data Verifies OK on GPU reference (9.11977e-05 < 0.0082)