AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Avoid dynamic memory allocation in kernel launch

Open pfultz2 opened this issue 10 months ago • 1 comments

pfultz2 avatar Mar 03 '25 14:03 pfultz2

Codecov Report

:x: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/program.cpp 91.67% 2 Missing :warning:
src/env.cpp 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3861      +/-   ##
===========================================
+ Coverage    92.27%   92.28%   +0.01%     
===========================================
  Files          556      556              
  Lines        25832    25841       +9     
===========================================
+ Hits         23835    23845      +10     
+ Misses        1997     1996       -1     
Files with missing lines Coverage Δ
src/include/migraphx/program.hpp 100.00% <ø> (ø)
src/env.cpp 79.41% <0.00%> (ø)
src/program.cpp 71.45% <91.67%> (+0.50%) :arrow_up:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Mar 03 '25 16:03 codecov[bot]

@pfultz2 just need to fix the merge conflicts

causten avatar Apr 21 '25 21:04 causten

@pfultz2 just need to fix the merge conflicts

Please add a change for using reference as mention in the review comment: https://github.com/ROCm/AMDMIGraphX/pull/3861#discussion_r1995879764. This PR basically needed the use of a reference in the place that was critically missing it in the nested calls, and that is now fixed here, and the rest of the PMR stuff is likely not going to make a significant difference. The other reference should also be added.

Also, not use Raw constants, as suggested in my code review comments, and also independently by Copilot.

lakhinderwalia avatar Jun 11 '25 16:06 lakhinderwalia

Test Batch Rate new
472938
Rate old
397919
Diff Compare
torchvision-resnet50 64 3,161.12 3,245.86 -2.61% :white_check_mark:
torchvision-resnet50_fp16 64 6,592.64 6,951.81 -5.17% :red_circle:
torchvision-densenet121 32 2,440.08 2,449.22 -0.37% :white_check_mark:
torchvision-densenet121_fp16 32 4,116.66 4,167.34 -1.22% :white_check_mark:
torchvision-inceptionv3 32 1,666.63 1,635.29 1.92% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,587.01 2,759.38 -6.25% :red_circle:
cadene-inceptionv4 16 794.90 770.72 3.14% :high_brightness:
cadene-resnext64x4 16 802.73 817.99 -1.87% :white_check_mark:
slim-mobilenet 64 8,210.54 7,456.32 10.12% :high_brightness:
slim-nasnetalarge 64 221.80 210.95 5.15% :high_brightness:
slim-resnet50v2 64 3,296.54 3,341.58 -1.35% :white_check_mark:
bert-mrpc-onnx 8 1,136.07 1,144.86 -0.77% :white_check_mark:
bert-mrpc-tf 1 487.31 445.07 9.49% :high_brightness:
pytorch-examples-wlang-gru 1 314.16 299.79 4.79% :high_brightness:
pytorch-examples-wlang-lstm 1 443.74 399.30 11.13% :high_brightness:
torchvision-resnet50_1 1 801.64 761.18 5.32% :high_brightness:
cadene-dpn92_1 1 433.89 384.27 12.91% :high_brightness:
cadene-resnext101_1 1 367.49 391.94 -6.24% :red_circle:
onnx-taau-downsample 1 398.38 395.58 0.71% :white_check_mark:
dlrm-criteoterabyte 1 31.92 33.78 -5.49% :red_circle:
dlrm-criteoterabyte_fp16 1 50.97 51.23 -0.50% :white_check_mark:
agentmodel 1 10,055.79 9,034.65 11.30% :high_brightness:
unet_fp16 2 58.84 59.18 -0.57% :white_check_mark:
resnet50v1_fp16 1 995.77 989.25 0.66% :white_check_mark:
resnet50v1_int8 1 996.96 1,022.00 -2.45% :white_check_mark:
bert_base_cased_fp16 64 1,109.94 1,106.73 0.29% :white_check_mark:
bert_large_uncased_fp16 32 343.82 345.26 -0.42% :white_check_mark:
bert_large_fp16 1 198.11 197.13 0.50% :white_check_mark:
distilgpt2_fp16 16 2,096.26 2,115.80 -0.92% :white_check_mark:
yolov5s 1 588.26 576.03 2.12% :white_check_mark:
tinyllama 1 43.79 43.97 -0.39% :white_check_mark:
vicuna-fastchat 1 45.09 45.28 -0.42% :white_check_mark:
whisper-tiny-encoder 1 410.05 417.53 -1.79% :white_check_mark:
whisper-tiny-decoder 1 413.84 408.53 1.30% :white_check_mark:
llama2_7b 1 19.11 19.16 -0.26% :white_check_mark:
qwen1.5-7b 1 23.45 23.51 -0.26% :white_check_mark:
phi3-3.8b 1 26.57 26.67 -0.39% :white_check_mark:
mask-rcnn 1 12.07 12.01 0.53% :white_check_mark:
llama3-8b 1 21.67 21.72 -0.25% :white_check_mark:
whisper-large-encoder 1 10.17 10.21 -0.42% :white_check_mark:
whisper-large-decoder 1 99.35 95.77 3.74% :high_brightness:
mistral-7b 1 23.67 23.72 -0.18% :white_check_mark:
FLUX.1-schnell 1 724.29 746.70 -3.00% :red_circle:
nan nan nan nan nan% :x:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Aug 29 '25 09:08 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
:x:bert-mrpc-tf: ERROR - check error outputerror: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

2025-08-29 03:34:34.346623: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')

     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance
     :white_check_mark: llama2_7b: PASSED: MIGraphX meets tolerance
     :white_check_mark: qwen1.5-7b: PASSED: MIGraphX meets tolerance
     :white_check_mark: phi3-3.8b: PASSED: MIGraphX meets tolerance
:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance
     :white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Aug 29 '25 09:08 migraphx-bot