AMDMIGraphX Keep LayerNorm accumulator at FP32

Avoid the overflow in calculations of variance..

Mar 25 '24 20:03 lakhinderwalia

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.81%. Comparing base (3ace932) to head (d8422b2). Report is 151 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #2925   +/-   ##
========================================
  Coverage    91.81%   91.81%           
========================================
  Files          486      486           
  Lines        18977    18977           
========================================
  Hits         17423    17423           
  Misses        1554     1554

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 25 '24 21:03 codecov[bot]

Test Batch Rate new
d8422b Rate old
3b3eca Diff Compare

torchvision-resnet50 64 2,957.76 2,950.34 0.25% :white_check_mark:

torchvision-resnet50_fp16 64 6,556.84 6,567.55 -0.16% :white_check_mark:

torchvision-densenet121 32 2,424.33 2,421.55 0.12% :white_check_mark:

torchvision-densenet121_fp16 32 3,973.01 3,971.69 0.03% :white_check_mark:

torchvision-inceptionv3 32 1,660.71 1,659.48 0.07% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,598.14 2,599.62 -0.06% :white_check_mark:

cadene-inceptionv4 16 777.14 776.25 0.12% :white_check_mark:

cadene-resnext64x4 16 740.33 740.67 -0.04% :white_check_mark:

slim-mobilenet 64 6,916.73 6,926.37 -0.14% :white_check_mark:

slim-nasnetalarge 64 177.03 177.12 -0.05% :white_check_mark:

slim-resnet50v2 64 2,876.73 2,877.81 -0.04% :white_check_mark:

bert-mrpc-onnx 8 1,064.50 1,064.78 -0.03% :white_check_mark:

bert-mrpc-tf 1 485.09 499.76 -2.94% :white_check_mark:

pytorch-examples-wlang-gru 1 548.80 431.20 27.27% :high_brightness:

pytorch-examples-wlang-lstm 1 496.24 349.07 42.16% :high_brightness:

torchvision-resnet50_1 1 779.16 794.82 -1.97% :white_check_mark:

cadene-dpn92_1 1 441.07 397.45 10.98% :high_brightness:

cadene-resnext101_1 1 366.99 361.22 1.60% :white_check_mark:

onnx-taau-downsample 1 350.00 349.66 0.10% :white_check_mark:

dlrm-criteoterabyte 1 33.63 33.64 -0.06% :white_check_mark:

dlrm-criteoterabyte_fp16 1 56.69 56.59 0.17% :white_check_mark:

agentmodel 1 7,743.41 7,792.96 -0.64% :white_check_mark:

unet_fp16 2 56.32 57.44 -1.95% :white_check_mark:

resnet50v1_fp16 1 971.52 902.13 7.69% :high_brightness:

resnet50v1_int8 1 757.88 822.26 -7.83% :red_circle:

bert_base_cased_fp16 64 1,011.46 1,012.77 -0.13% :white_check_mark:

bert_large_uncased_fp16 32 316.55 316.81 -0.08% :white_check_mark:

bert_large_fp16 1 nan nan nan% :x:

distilgpt2_fp16 16 1,991.10 1,994.70 -0.18% :white_check_mark:

yolov5s 1 517.26 514.72 0.49% :white_check_mark:

tinyllama 1 45.04 45.02 0.04% :white_check_mark:

vicuna-fastchat 1 183.00 180.47 1.40% :white_check_mark:

whisper-tiny-encoder 1 404.85 403.06 0.44% :white_check_mark:

whisper-tiny-decoder 1 420.84 424.49 -0.86% :white_check_mark:

Test	Batch	Rate new d8422b	Rate old 3b3eca	Diff	Compare
torchvision-resnet50	64	2,957.76	2,950.34	0.25%	:white_check_mark:
torchvision-resnet50_fp16	64	6,556.84	6,567.55	-0.16%	:white_check_mark:
torchvision-densenet121	32	2,424.33	2,421.55	0.12%	:white_check_mark:
torchvision-densenet121_fp16	32	3,973.01	3,971.69	0.03%	:white_check_mark:
torchvision-inceptionv3	32	1,660.71	1,659.48	0.07%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,598.14	2,599.62	-0.06%	:white_check_mark:
cadene-inceptionv4	16	777.14	776.25	0.12%	:white_check_mark:
cadene-resnext64x4	16	740.33	740.67	-0.04%	:white_check_mark:
slim-mobilenet	64	6,916.73	6,926.37	-0.14%	:white_check_mark:
slim-nasnetalarge	64	177.03	177.12	-0.05%	:white_check_mark:
slim-resnet50v2	64	2,876.73	2,877.81	-0.04%	:white_check_mark:
bert-mrpc-onnx	8	1,064.50	1,064.78	-0.03%	:white_check_mark:
bert-mrpc-tf	1	485.09	499.76	-2.94%	:white_check_mark:
pytorch-examples-wlang-gru	1	548.80	431.20	27.27%	:high_brightness:
pytorch-examples-wlang-lstm	1	496.24	349.07	42.16%	:high_brightness:
torchvision-resnet50_1	1	779.16	794.82	-1.97%	:white_check_mark:
cadene-dpn92_1	1	441.07	397.45	10.98%	:high_brightness:
cadene-resnext101_1	1	366.99	361.22	1.60%	:white_check_mark:
onnx-taau-downsample	1	350.00	349.66	0.10%	:white_check_mark:
dlrm-criteoterabyte	1	33.63	33.64	-0.06%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	56.69	56.59	0.17%	:white_check_mark:
agentmodel	1	7,743.41	7,792.96	-0.64%	:white_check_mark:
unet_fp16	2	56.32	57.44	-1.95%	:white_check_mark:
resnet50v1_fp16	1	971.52	902.13	7.69%	:high_brightness:
resnet50v1_int8	1	757.88	822.26	-7.83%	:red_circle:
bert_base_cased_fp16	64	1,011.46	1,012.77	-0.13%	:white_check_mark:
bert_large_uncased_fp16	32	316.55	316.81	-0.08%	:white_check_mark:
bert_large_fp16	1	nan	nan	nan%	:x:
distilgpt2_fp16	16	1,991.10	1,994.70	-0.18%	:white_check_mark:
yolov5s	1	517.26	514.72	0.49%	:white_check_mark:
tinyllama	1	45.04	45.02	0.04%	:white_check_mark:
vicuna-fastchat	1	183.00	180.47	1.40%	:white_check_mark:
whisper-tiny-encoder	1	404.85	403.06	0.44%	:white_check_mark:
whisper-tiny-decoder	1	420.84	424.49	-0.86%	:white_check_mark:

This build is not recommended to merge :red_circle:

Mar 25 '24 22:03 migraphx-bot

:x:bert-mrpc-onnx: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/huggingface-transformers/bert_mrpc1.onnx

:white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:x:cadene-resnext101_1: ERROR - check error output

2024-05-20 17:07:56.211507790 [W:onnxruntime:, model.cc:183 Model] ONNX Runtime only guarantees support for models stamped with opset version 7 or above for opset domain 'ai.onnx'. Please upgrade your model to opset 7 or higher. For now, this opset 6 model may run depending upon legacy support of some older opset version operators.
2024-05-20 17:07:56.217572861 [W:onnxruntime:, transpose_optimizer.cc:28 ApplyImpl] Transpose optimizer failed: Unsupported ONNX opset: 6
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 267, in main
sess = ort.InferenceSession(model_name,
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 463, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for BatchNormalization(6) node with name ''

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:x:unet: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 207, in main
model = migraphx.parse_onnx(model_name,
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/unet/model.onnx

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:x:bert_large: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/bert/model.onnx

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Mar 25 '24 22:03 migraphx-bot

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2)

With this change: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran

Original code:

# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran

Apr 01 '24 19:04 lakhinderwalia

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2)

With this change: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran

Original code:

# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran

Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?

Apr 01 '24 20:04 umangyadav

Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?

This is with std::vector<size_t> dims = {1, 32, 262144};

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2) With this change: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran Original code: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran

Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?

This is std::vector<size_t> dims = {1, 32, 262144}; test_layernorm_large

Apr 01 '24 20:04 lakhinderwalia

LayerNorm operator perf report comparison testing of a large FP16 tensor: half_type, {1, 32, 8388608}

New code: gpu::code_object::layernorm_mul_add_kernel: 3.99752ms / 1 = 3.99752ms Old code : gpu::code_object::layernorm_mul_add_kernel: 3.87002ms / 1 = 3.87002ms

Apr 02 '24 03:04 lakhinderwalia

AMDMIGraphX AMDMIGraphX copied to clipboard

Keep LayerNorm accumulator at FP32

Codecov Report

AMDMIGraphX
AMDMIGraphX copied to clipboard