MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

For some combinations bnorm giving in correct result

Open RambabuSwargam opened this issue 1 year ago • 9 comments

./bin/MIOpenDriver bnorm -n 16 -c 64 -H 147 -W 147 -m 1 --forw 0 -b 1 -r 1 Backwards prop batch norm verification passed on dx. Backwards prop batch norm verification FAILED on dscale: 0.844537 max difference in dscale: 0 Backwards prop batch norm verification passed on dbias.

Attached Miopen log. BWD_Failure.log

Could some one help to identify the issue or any work around is there for this.

Some other combinations which are giving issue ./bin/MIOpenDriver bnorm -n 16 -c 80 -H 73 -W 73 -m 1 --forw 0 -b 1 -r 1 ./bin/MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -r 1

RambabuSwargam avatar Mar 16 '23 13:03 RambabuSwargam

[attribution] @junliume @johnny-keker https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_normal Please assign this to me and to @muralinr

atamazov avatar Mar 21 '23 12:03 atamazov

@RambabuSwargam Please update your ROCm from 5.0 to the latest version and re-test. If bug persists, please attach log taken with MIOPEN_LOG_LEVEL=6. Thanks.

atamazov avatar Mar 21 '23 12:03 atamazov

@muralinr could you take a look at this issue?

junliume avatar Jul 06 '23 21:07 junliume

This above config is passed in my setup. It looks other two configs are having an issue. I will look at failed ones.

./bin/MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -r 1 MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -r 1 Backwards prop batch norm verification passed on dx (0.00022004) Backwards prop batch norm verification passed on dscale (0.0392205) Backwards prop batch norm verification passed on dbias (0.000186849) Backwards Prop Batch Norm Verifies on CPU and GPU.

muralinr avatar Jul 06 '23 21:07 muralinr

@RambabuSwargam Please update your ROCm from 5.0 to the latest version and re-test. If bug persists, please attach log taken with MIOPEN_LOG_LEVEL=6. Thanks.

@RambabuSwargam Ping. We need this info to reproduce the issue.

atamazov avatar Jul 06 '23 23:07 atamazov

I reproduced this issue for two of above configs. Batchnorm backward input argument combinations are incorrect here. Backward Batchnorm uses saved mean (-s option) instead of recurrent mean (-r option). With right inputs, I see above batchnorm backward configs are passed in my test setup.

./bin/MIOpenDriver bnorm -n 16 -c 64 -H 147 -W 147 -m 1 --forw 0 -b 1 -s 1 MIOpenDriver bnorm -n 16 -c 64 -H 147 -W 147 -m 1 --forw 0 -b 1 -s 1 Backwards prop batch norm verification passed on dx (3.56658e-06) Backwards prop batch norm verification passed on dscale (1.08358e-05) Backwards prop batch norm verification passed on dbias (9.76879e-06) Backwards Prop Batch Norm Verifies on CPU and GPU. ./bin/MIOpenDriver bnorm -n 16 -c 80 -H 73 -W 73 -m 1 --forw 0 -b 1 -s 1 MIOpenDriver bnorm -n 16 -c 80 -H 73 -W 73 -m 1 --forw 0 -b 1 -s 1 Backwards prop batch norm verification passed on dx (1.02773e-05) Backwards prop batch norm verification passed on dscale (3.38332e-05) Backwards prop batch norm verification passed on dbias (4.68447e-05) Backwards Prop Batch Norm Verifies on CPU and GPU. ./bin/MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -s 1 MIOpenDriver bnorm -n 16 -c 96 -H 35 -W 35 -m 1 --forw 0 -b 1 -s 1 Backwards prop batch norm verification passed on dx (7.39998e-05) Backwards prop batch norm verification passed on dscale (0.000171115) Backwards prop batch norm verification passed on dbias (0.000217263) Backwards Prop Batch Norm Verifies on CPU and GPU.

Here is how saved mean and running mean are used in Batchnorm:

  1. “saved mean”
  • It’s calculated for each mini batch, each mini batch has different saved mean.
  • Saved mean is calculated in forward training, and used in backward training
  1. “running mean”
  • It is calculated by averaging old mean (from previous mini-batches), with new mean (from current mini-batches)
  • It is calculated during forward training, and used in inference

muralinr avatar Jul 07 '23 00:07 muralinr

@muralinr Good. So the only problem is that the combination of "--forw 0" and "-r 1" is allowed in the driver, right?

atamazov avatar Jul 07 '23 07:07 atamazov

@muralinr Good. So the only problem is that the combination of "--forw 0" and "-r 1" is allowed in the driver, right? Yes Artem. "--forw 0 -b 1 -r 1" should not be used. "--forw 0 -b 1 -s 1" should be used for backward.

muralinr avatar Jul 07 '23 16:07 muralinr

@RambabuSwargam Can you please test with latest ROCm 6.1.0? If resolved, please close ticket. Thanks!

ppanchad-amd avatar Apr 23 '24 15:04 ppanchad-amd