Set attribute to help bypass the warning about amdgpu_waves_per_eu
The upgraded tool chain is giving a new compile warning that needs to be bypassed for the topk test to successfully compile, and run.
[ RUN ] test_topk<migraphx::shape::half_type, 1000, 120000> /tmp/comgr-d7a292/input/main.cpp:11:22: error: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'topk_kernel': desired occupancy was 2, final occupancy is 1 [-Werror,-Wpass-failed]
| Test | Batch | Rate new 918283 |
Rate old 018cae |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,247.40 | 3,248.22 | -0.03% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,961.71 | 6,961.14 | 0.01% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,450.76 | 2,449.97 | 0.03% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,170.57 | 4,164.44 | 0.15% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,636.60 | 1,636.10 | 0.03% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,760.19 | 2,752.74 | 0.27% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 771.57 | 771.22 | 0.04% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 818.91 | 818.68 | 0.03% | :white_check_mark: |
| slim-mobilenet | 64 | 7,460.56 | 7,459.63 | 0.01% | :white_check_mark: |
| slim-nasnetalarge | 64 | 211.06 | 211.08 | -0.01% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,344.50 | 3,342.04 | 0.07% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,145.14 | 1,145.42 | -0.02% | :white_check_mark: |
| bert-mrpc-tf | 1 | 445.65 | 442.53 | 0.70% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 294.77 | 298.60 | -1.28% | :white_check_mark: |
| pytorch-examples-wlang-lstm | 1 | 404.81 | 411.85 | -1.71% | :white_check_mark: |
| torchvision-resnet50_1 | 1 | 767.18 | 767.54 | -0.05% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 386.19 | 392.42 | -1.59% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 392.00 | 393.79 | -0.45% | :white_check_mark: |
| onnx-taau-downsample | 1 | 395.74 | 395.77 | -0.01% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 33.76 | 33.77 | -0.03% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 51.24 | 51.25 | -0.02% | :white_check_mark: |
| agentmodel | 1 | 8,334.03 | 8,931.04 | -6.68% | :red_circle: |
| unet_fp16 | 2 | 59.14 | 59.14 | 0.01% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 980.63 | 978.22 | 0.25% | :white_check_mark: |
| resnet50v1_int8 | 1 | 1,030.81 | 1,025.85 | 0.48% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,107.47 | 1,107.56 | -0.01% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 345.29 | 345.42 | -0.04% | :white_check_mark: |
| bert_large_fp16 | 1 | 197.48 | 196.96 | 0.26% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,117.26 | 2,118.48 | -0.06% | :white_check_mark: |
| yolov5s | 1 | 566.10 | 570.35 | -0.75% | :white_check_mark: |
| tinyllama | 1 | 43.95 | 43.98 | -0.07% | :white_check_mark: |
| vicuna-fastchat | 1 | 45.27 | 45.38 | -0.22% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 417.62 | 417.79 | -0.04% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 400.51 | 409.99 | -2.31% | :white_check_mark: |
| llama2_7b | 1 | 19.16 | 19.16 | 0.00% | :white_check_mark: |
| qwen1.5-7b | 1 | 23.53 | 23.54 | -0.02% | :white_check_mark: |
| phi3-3.8b | 1 | 26.70 | 26.67 | 0.10% | :white_check_mark: |
| mask-rcnn | 1 | 12.51 | 12.44 | 0.57% | :white_check_mark: |
| llama3-8b | 1 | 21.72 | 21.73 | -0.05% | :white_check_mark: |
| whisper-large-encoder | 1 | 10.22 | 10.22 | -0.01% | :white_check_mark: |
| whisper-large-decoder | 1 | 96.60 | 96.35 | 0.26% | :white_check_mark: |
| mistral-7b | 1 | 23.73 | 23.74 | -0.05% | :white_check_mark: |
| FLUX.1-schnell | 1 | 738.85 | 742.55 | -0.50% | :white_check_mark: |
| nan | nan | nan | nan | nan% | :x: |
This build is not recommended to merge :red_circle:
:x:bert-mrpc-tf: ERROR - check error output
error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]
error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]
2025-08-08 16:42:13.363812: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1754689338.796337 173517 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62951 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:b3:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1754689339.695586 173517 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-08-08 16:42:28.331802: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.331993: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332063: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332117: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332162: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332192: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332243: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332293: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-08-08 16:42:28.333307: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-08-08 16:42:28.334401: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-08-08 16:42:28.334420: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-08-08 16:42:28.334431: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-08-08 16:42:28.334448: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 335, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'::red_circle:unet: FAILED: MIGraphX is not within tolerance - check verbose output
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output