Avoid dynamic memory allocation in kernel launch
Codecov Report
:x: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/program.cpp | 91.67% | 2 Missing :warning: |
| src/env.cpp | 0.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #3861 +/- ##
===========================================
+ Coverage 92.27% 92.28% +0.01%
===========================================
Files 556 556
Lines 25832 25841 +9
===========================================
+ Hits 23835 23845 +10
+ Misses 1997 1996 -1
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/include/migraphx/program.hpp | 100.00% <ø> (ø) |
|
| src/env.cpp | 79.41% <0.00%> (ø) |
|
| src/program.cpp | 71.45% <91.67%> (+0.50%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@pfultz2 just need to fix the merge conflicts
@pfultz2 just need to fix the merge conflicts
Please add a change for using reference as mention in the review comment: https://github.com/ROCm/AMDMIGraphX/pull/3861#discussion_r1995879764.
This PR basically needed the use of a reference in the place that was critically missing it in the nested calls, and that is now fixed here, and the rest of the PMR stuff is likely not going to make a significant difference. The other reference should also be added.
Also, not use Raw constants, as suggested in my code review comments, and also independently by Copilot.
| Test | Batch | Rate new 472938 |
Rate old 397919 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,161.12 | 3,245.86 | -2.61% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,592.64 | 6,951.81 | -5.17% | :red_circle: |
| torchvision-densenet121 | 32 | 2,440.08 | 2,449.22 | -0.37% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,116.66 | 4,167.34 | -1.22% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,666.63 | 1,635.29 | 1.92% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,587.01 | 2,759.38 | -6.25% | :red_circle: |
| cadene-inceptionv4 | 16 | 794.90 | 770.72 | 3.14% | :high_brightness: |
| cadene-resnext64x4 | 16 | 802.73 | 817.99 | -1.87% | :white_check_mark: |
| slim-mobilenet | 64 | 8,210.54 | 7,456.32 | 10.12% | :high_brightness: |
| slim-nasnetalarge | 64 | 221.80 | 210.95 | 5.15% | :high_brightness: |
| slim-resnet50v2 | 64 | 3,296.54 | 3,341.58 | -1.35% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,136.07 | 1,144.86 | -0.77% | :white_check_mark: |
| bert-mrpc-tf | 1 | 487.31 | 445.07 | 9.49% | :high_brightness: |
| pytorch-examples-wlang-gru | 1 | 314.16 | 299.79 | 4.79% | :high_brightness: |
| pytorch-examples-wlang-lstm | 1 | 443.74 | 399.30 | 11.13% | :high_brightness: |
| torchvision-resnet50_1 | 1 | 801.64 | 761.18 | 5.32% | :high_brightness: |
| cadene-dpn92_1 | 1 | 433.89 | 384.27 | 12.91% | :high_brightness: |
| cadene-resnext101_1 | 1 | 367.49 | 391.94 | -6.24% | :red_circle: |
| onnx-taau-downsample | 1 | 398.38 | 395.58 | 0.71% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 31.92 | 33.78 | -5.49% | :red_circle: |
| dlrm-criteoterabyte_fp16 | 1 | 50.97 | 51.23 | -0.50% | :white_check_mark: |
| agentmodel | 1 | 10,055.79 | 9,034.65 | 11.30% | :high_brightness: |
| unet_fp16 | 2 | 58.84 | 59.18 | -0.57% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 995.77 | 989.25 | 0.66% | :white_check_mark: |
| resnet50v1_int8 | 1 | 996.96 | 1,022.00 | -2.45% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,109.94 | 1,106.73 | 0.29% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 343.82 | 345.26 | -0.42% | :white_check_mark: |
| bert_large_fp16 | 1 | 198.11 | 197.13 | 0.50% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,096.26 | 2,115.80 | -0.92% | :white_check_mark: |
| yolov5s | 1 | 588.26 | 576.03 | 2.12% | :white_check_mark: |
| tinyllama | 1 | 43.79 | 43.97 | -0.39% | :white_check_mark: |
| vicuna-fastchat | 1 | 45.09 | 45.28 | -0.42% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 410.05 | 417.53 | -1.79% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 413.84 | 408.53 | 1.30% | :white_check_mark: |
| llama2_7b | 1 | 19.11 | 19.16 | -0.26% | :white_check_mark: |
| qwen1.5-7b | 1 | 23.45 | 23.51 | -0.26% | :white_check_mark: |
| phi3-3.8b | 1 | 26.57 | 26.67 | -0.39% | :white_check_mark: |
| mask-rcnn | 1 | 12.07 | 12.01 | 0.53% | :white_check_mark: |
| llama3-8b | 1 | 21.67 | 21.72 | -0.25% | :white_check_mark: |
| whisper-large-encoder | 1 | 10.17 | 10.21 | -0.42% | :white_check_mark: |
| whisper-large-decoder | 1 | 99.35 | 95.77 | 3.74% | :high_brightness: |
| mistral-7b | 1 | 23.67 | 23.72 | -0.18% | :white_check_mark: |
| FLUX.1-schnell | 1 | 724.29 | 746.70 | -3.00% | :red_circle: |
| nan | nan | nan | nan | nan% | :x: |
This build is not recommended to merge :red_circle:
:x:bert-mrpc-tf: ERROR - check error output
error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]
error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]
2025-08-29 03:34:34.346623: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb'):red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output