AMDMIGraphX
AMDMIGraphX copied to clipboard
Keep LayerNorm accumulator at FP32
Avoid the overflow in calculations of variance..
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.81%. Comparing base (
3ace932
) to head (d8422b2
). Report is 151 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #2925 +/- ##
========================================
Coverage 91.81% 91.81%
========================================
Files 486 486
Lines 18977 18977
========================================
Hits 17423 17423
Misses 1554 1554
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Test | Batch | Rate new d8422b |
Rate old 3b3eca |
Diff | Compare |
---|---|---|---|---|---|
torchvision-resnet50 | 64 | 2,957.76 | 2,950.34 | 0.25% | :white_check_mark: |
torchvision-resnet50_fp16 | 64 | 6,556.84 | 6,567.55 | -0.16% | :white_check_mark: |
torchvision-densenet121 | 32 | 2,424.33 | 2,421.55 | 0.12% | :white_check_mark: |
torchvision-densenet121_fp16 | 32 | 3,973.01 | 3,971.69 | 0.03% | :white_check_mark: |
torchvision-inceptionv3 | 32 | 1,660.71 | 1,659.48 | 0.07% | :white_check_mark: |
torchvision-inceptionv3_fp16 | 32 | 2,598.14 | 2,599.62 | -0.06% | :white_check_mark: |
cadene-inceptionv4 | 16 | 777.14 | 776.25 | 0.12% | :white_check_mark: |
cadene-resnext64x4 | 16 | 740.33 | 740.67 | -0.04% | :white_check_mark: |
slim-mobilenet | 64 | 6,916.73 | 6,926.37 | -0.14% | :white_check_mark: |
slim-nasnetalarge | 64 | 177.03 | 177.12 | -0.05% | :white_check_mark: |
slim-resnet50v2 | 64 | 2,876.73 | 2,877.81 | -0.04% | :white_check_mark: |
bert-mrpc-onnx | 8 | 1,064.50 | 1,064.78 | -0.03% | :white_check_mark: |
bert-mrpc-tf | 1 | 485.09 | 499.76 | -2.94% | :white_check_mark: |
pytorch-examples-wlang-gru | 1 | 548.80 | 431.20 | 27.27% | :high_brightness: |
pytorch-examples-wlang-lstm | 1 | 496.24 | 349.07 | 42.16% | :high_brightness: |
torchvision-resnet50_1 | 1 | 779.16 | 794.82 | -1.97% | :white_check_mark: |
cadene-dpn92_1 | 1 | 441.07 | 397.45 | 10.98% | :high_brightness: |
cadene-resnext101_1 | 1 | 366.99 | 361.22 | 1.60% | :white_check_mark: |
onnx-taau-downsample | 1 | 350.00 | 349.66 | 0.10% | :white_check_mark: |
dlrm-criteoterabyte | 1 | 33.63 | 33.64 | -0.06% | :white_check_mark: |
dlrm-criteoterabyte_fp16 | 1 | 56.69 | 56.59 | 0.17% | :white_check_mark: |
agentmodel | 1 | 7,743.41 | 7,792.96 | -0.64% | :white_check_mark: |
unet_fp16 | 2 | 56.32 | 57.44 | -1.95% | :white_check_mark: |
resnet50v1_fp16 | 1 | 971.52 | 902.13 | 7.69% | :high_brightness: |
resnet50v1_int8 | 1 | 757.88 | 822.26 | -7.83% | :red_circle: |
bert_base_cased_fp16 | 64 | 1,011.46 | 1,012.77 | -0.13% | :white_check_mark: |
bert_large_uncased_fp16 | 32 | 316.55 | 316.81 | -0.08% | :white_check_mark: |
bert_large_fp16 | 1 | nan | nan | nan% | :x: |
distilgpt2_fp16 | 16 | 1,991.10 | 1,994.70 | -0.18% | :white_check_mark: |
yolov5s | 1 | 517.26 | 514.72 | 0.49% | :white_check_mark: |
tinyllama | 1 | 45.04 | 45.02 | 0.04% | :white_check_mark: |
vicuna-fastchat | 1 | 183.00 | 180.47 | 1.40% | :white_check_mark: |
whisper-tiny-encoder | 1 | 404.85 | 403.06 | 0.44% | :white_check_mark: |
whisper-tiny-decoder | 1 | 420.84 | 424.49 | -0.86% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:x:bert-mrpc-onnx: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/huggingface-transformers/bert_mrpc1.onnx:x:cadene-resnext101_1: ERROR - check error output
2024-05-20 17:07:56.211507790 [W:onnxruntime:, model.cc:183 Model] ONNX Runtime only guarantees support for models stamped with opset version 7 or above for opset domain 'ai.onnx'. Please upgrade your model to opset 7 or higher. For now, this opset 6 model may run depending upon legacy support of some older opset version operators.
2024-05-20 17:07:56.217572861 [W:onnxruntime:, transpose_optimizer.cc:28 ApplyImpl] Transpose optimizer failed: Unsupported ONNX opset: 6
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 267, in main
sess = ort.InferenceSession(model_name,
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 463, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for BatchNormalization(6) node with name '':x:unet: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 207, in main
model = migraphx.parse_onnx(model_name,
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/unet/model.onnx:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:x:bert_large: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/bert/model.onnx
Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2)
With this change:
# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran
Original code:
# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran
Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2)
With this change:
# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran
Original code:
# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran
Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?
Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?
This is with std::vector<size_t> dims = {1, 32, 262144};
Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2) With this change:
# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran
Original code:# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran
Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?
This is std::vector<size_t> dims = {1, 32, 262144}; test_layernorm_large
LayerNorm operator perf report comparison testing of a large FP16 tensor:
half_type, {1, 32, 8388608}
New code: gpu::code_object::layernorm_mul_add_kernel: 3.99752ms / 1 = 3.99752ms
Old code : gpu::code_object::layernorm_mul_add_kernel: 3.87002ms / 1 = 3.87002ms