MIOpen
MIOpen copied to clipboard
For some combinations bnorm giving in correct result
./bin/MIOpenDriver bnorm -n 16 -c 64 -H 147 -W 147 -m 1 --forw 0 -b 1 -r 1 Backwards prop batch norm verification passed on dx. Backwards prop batch norm verification FAILED on dscale: 0.844537 max difference in dscale: 0 Backwards prop batch norm verification passed on dbias.
Attached Miopen log. BWD_Failure.log
Could some one help to identify the issue or any work around is there for this.
Some other combinations which are giving issue ./bin/MIOpenDriver bnorm -n 16 -c 80 -H 73 -W 73 -m 1 --forw 0 -b 1 -r 1 ./bin/MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -r 1
[attribution] @junliume @johnny-keker https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_normal Please assign this to me and to @muralinr
@RambabuSwargam Please update your ROCm from 5.0 to the latest version and re-test. If bug persists, please attach log taken with MIOPEN_LOG_LEVEL=6. Thanks.
@muralinr could you take a look at this issue?
This above config is passed in my setup. It looks other two configs are having an issue. I will look at failed ones.
./bin/MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -r 1 MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -r 1 Backwards prop batch norm verification passed on dx (0.00022004) Backwards prop batch norm verification passed on dscale (0.0392205) Backwards prop batch norm verification passed on dbias (0.000186849) Backwards Prop Batch Norm Verifies on CPU and GPU.
@RambabuSwargam Please update your ROCm from 5.0 to the latest version and re-test. If bug persists, please attach log taken with MIOPEN_LOG_LEVEL=6. Thanks.
@RambabuSwargam Ping. We need this info to reproduce the issue.
I reproduced this issue for two of above configs. Batchnorm backward input argument combinations are incorrect here. Backward Batchnorm uses saved mean (-s option) instead of recurrent mean (-r option). With right inputs, I see above batchnorm backward configs are passed in my test setup.
./bin/MIOpenDriver bnorm -n 16 -c 64 -H 147 -W 147 -m 1 --forw 0 -b 1 -s 1 MIOpenDriver bnorm -n 16 -c 64 -H 147 -W 147 -m 1 --forw 0 -b 1 -s 1 Backwards prop batch norm verification passed on dx (3.56658e-06) Backwards prop batch norm verification passed on dscale (1.08358e-05) Backwards prop batch norm verification passed on dbias (9.76879e-06) Backwards Prop Batch Norm Verifies on CPU and GPU. ./bin/MIOpenDriver bnorm -n 16 -c 80 -H 73 -W 73 -m 1 --forw 0 -b 1 -s 1 MIOpenDriver bnorm -n 16 -c 80 -H 73 -W 73 -m 1 --forw 0 -b 1 -s 1 Backwards prop batch norm verification passed on dx (1.02773e-05) Backwards prop batch norm verification passed on dscale (3.38332e-05) Backwards prop batch norm verification passed on dbias (4.68447e-05) Backwards Prop Batch Norm Verifies on CPU and GPU. ./bin/MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -s 1 MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -s 1 Backwards prop batch norm verification passed on dx (7.39998e-05) Backwards prop batch norm verification passed on dscale (0.000171115) Backwards prop batch norm verification passed on dbias (0.000217263) Backwards Prop Batch Norm Verifies on CPU and GPU.
Here is how saved mean and running mean are used in Batchnorm:
- “saved mean”
- It’s calculated for each mini batch, each mini batch has different saved mean.
- Saved mean is calculated in forward training, and used in backward training
- “running mean”
- It is calculated by averaging old mean (from previous mini-batches), with new mean (from current mini-batches)
- It is calculated during forward training, and used in inference
@muralinr Good. So the only problem is that the combination of "--forw 0" and "-r 1" is allowed in the driver, right?
@muralinr Good. So the only problem is that the combination of "--forw 0" and "-r 1" is allowed in the driver, right? Yes Artem. "--forw 0 -b 1 -r 1" should not be used. "--forw 0 -b 1 -s 1" should be used for backward.
@RambabuSwargam Can you please test with latest ROCm 6.1.0? If resolved, please close ticket. Thanks!