AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Keep LayerNorm accumulator at FP32

Open lakhinderwalia opened this issue 10 months ago • 7 comments

Avoid the overflow in calculations of variance..

lakhinderwalia avatar Mar 25 '24 20:03 lakhinderwalia

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.81%. Comparing base (3ace932) to head (d8422b2). Report is 151 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2925   +/-   ##
========================================
  Coverage    91.81%   91.81%           
========================================
  Files          486      486           
  Lines        18977    18977           
========================================
  Hits         17423    17423           
  Misses        1554     1554           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 25 '24 21:03 codecov[bot]

Test Batch Rate new
d8422b
Rate old
3b3eca
Diff Compare
torchvision-resnet50 64 2,957.76 2,950.34 0.25% :white_check_mark:
torchvision-resnet50_fp16 64 6,556.84 6,567.55 -0.16% :white_check_mark:
torchvision-densenet121 32 2,424.33 2,421.55 0.12% :white_check_mark:
torchvision-densenet121_fp16 32 3,973.01 3,971.69 0.03% :white_check_mark:
torchvision-inceptionv3 32 1,660.71 1,659.48 0.07% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,598.14 2,599.62 -0.06% :white_check_mark:
cadene-inceptionv4 16 777.14 776.25 0.12% :white_check_mark:
cadene-resnext64x4 16 740.33 740.67 -0.04% :white_check_mark:
slim-mobilenet 64 6,916.73 6,926.37 -0.14% :white_check_mark:
slim-nasnetalarge 64 177.03 177.12 -0.05% :white_check_mark:
slim-resnet50v2 64 2,876.73 2,877.81 -0.04% :white_check_mark:
bert-mrpc-onnx 8 1,064.50 1,064.78 -0.03% :white_check_mark:
bert-mrpc-tf 1 485.09 499.76 -2.94% :white_check_mark:
pytorch-examples-wlang-gru 1 548.80 431.20 27.27% :high_brightness:
pytorch-examples-wlang-lstm 1 496.24 349.07 42.16% :high_brightness:
torchvision-resnet50_1 1 779.16 794.82 -1.97% :white_check_mark:
cadene-dpn92_1 1 441.07 397.45 10.98% :high_brightness:
cadene-resnext101_1 1 366.99 361.22 1.60% :white_check_mark:
onnx-taau-downsample 1 350.00 349.66 0.10% :white_check_mark:
dlrm-criteoterabyte 1 33.63 33.64 -0.06% :white_check_mark:
dlrm-criteoterabyte_fp16 1 56.69 56.59 0.17% :white_check_mark:
agentmodel 1 7,743.41 7,792.96 -0.64% :white_check_mark:
unet_fp16 2 56.32 57.44 -1.95% :white_check_mark:
resnet50v1_fp16 1 971.52 902.13 7.69% :high_brightness:
resnet50v1_int8 1 757.88 822.26 -7.83% :red_circle:
bert_base_cased_fp16 64 1,011.46 1,012.77 -0.13% :white_check_mark:
bert_large_uncased_fp16 32 316.55 316.81 -0.08% :white_check_mark:
bert_large_fp16 1 nan nan nan% :x:
distilgpt2_fp16 16 1,991.10 1,994.70 -0.18% :white_check_mark:
yolov5s 1 517.26 514.72 0.49% :white_check_mark:
tinyllama 1 45.04 45.02 0.04% :white_check_mark:
vicuna-fastchat 1 183.00 180.47 1.40% :white_check_mark:
whisper-tiny-encoder 1 404.85 403.06 0.44% :white_check_mark:
whisper-tiny-decoder 1 420.84 424.49 -0.86% :white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Mar 25 '24 22:03 migraphx-bot


:x:bert-mrpc-onnx: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/huggingface-transformers/bert_mrpc1.onnx

     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
:x:cadene-resnext101_1: ERROR - check error output2024-05-20 17:07:56.211507790 [W:onnxruntime:, model.cc:183 Model] ONNX Runtime only guarantees support for models stamped with opset version 7 or above for opset domain 'ai.onnx'. Please upgrade your model to opset 7 or higher. For now, this opset 6 model may run depending upon legacy support of some older opset version operators.
2024-05-20 17:07:56.217572861 [W:onnxruntime:, transpose_optimizer.cc:28 ApplyImpl] Transpose optimizer failed: Unsupported ONNX opset: 6
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 267, in main
sess = ort.InferenceSession(model_name,
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 463, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for BatchNormalization(6) node with name ''

     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
:x:unet: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 207, in main
model = migraphx.parse_onnx(model_name,
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/unet/model.onnx

     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:x:bert_large: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/bert/model.onnx

     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Mar 25 '24 22:03 migraphx-bot

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2)

With this change: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran

Original code:

# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran

lakhinderwalia avatar Apr 01 '24 19:04 lakhinderwalia

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2)

With this change: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran

Original code:

# bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran

Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?

umangyadav avatar Apr 01 '24 20:04 umangyadav

Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?

This is with std::vector<size_t> dims = {1, 32, 262144};

Test of test_layernorm_large with this change show very similar numbers: (on banff-cyxtera-s81-2) With this change: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12240.9ms) [==========] 1 tests ran Original code: # bin/test_verify test_layernorm_large [ RUN ] test_layernorm_large [ COMPLETE ] test_layernorm_large (12226ms) [==========] 1 tests ran

Is this with Shapes ? std::vector<size_t> dims = {1, 32, 8388608}; ?

This is std::vector<size_t> dims = {1, 32, 262144}; test_layernorm_large

lakhinderwalia avatar Apr 01 '24 20:04 lakhinderwalia

LayerNorm operator perf report comparison testing of a large FP16 tensor: half_type, {1, 32, 8388608}

New code: gpu::code_object::layernorm_mul_add_kernel: 3.99752ms / 1 = 3.99752ms Old code : gpu::code_object::layernorm_mul_add_kernel: 3.87002ms / 1 = 3.87002ms

lakhinderwalia avatar Apr 02 '24 03:04 lakhinderwalia