AMDMIGraphX Set attribute to help bypass the warning about amdgpu_waves_per

The upgraded tool chain is giving a new compile warning that needs to be bypassed for the topk test to successfully compile, and run.

[ RUN ] test_topk<migraphx::shape::half_type, 1000, 120000> /tmp/comgr-d7a292/input/main.cpp:11:22: error: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'topk_kernel': desired occupancy was 2, final occupancy is 1 [-Werror,-Wpass-failed]

Aug 08 '25 19:08 lakhinderwalia

Test Batch Rate new
918283 Rate old
018cae Diff Compare

torchvision-resnet50 64 3,247.40 3,248.22 -0.03% :white_check_mark:

torchvision-resnet50_fp16 64 6,961.71 6,961.14 0.01% :white_check_mark:

torchvision-densenet121 32 2,450.76 2,449.97 0.03% :white_check_mark:

torchvision-densenet121_fp16 32 4,170.57 4,164.44 0.15% :white_check_mark:

torchvision-inceptionv3 32 1,636.60 1,636.10 0.03% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,760.19 2,752.74 0.27% :white_check_mark:

cadene-inceptionv4 16 771.57 771.22 0.04% :white_check_mark:

cadene-resnext64x4 16 818.91 818.68 0.03% :white_check_mark:

slim-mobilenet 64 7,460.56 7,459.63 0.01% :white_check_mark:

slim-nasnetalarge 64 211.06 211.08 -0.01% :white_check_mark:

slim-resnet50v2 64 3,344.50 3,342.04 0.07% :white_check_mark:

bert-mrpc-onnx 8 1,145.14 1,145.42 -0.02% :white_check_mark:

bert-mrpc-tf 1 445.65 442.53 0.70% :white_check_mark:

pytorch-examples-wlang-gru 1 294.77 298.60 -1.28% :white_check_mark:

pytorch-examples-wlang-lstm 1 404.81 411.85 -1.71% :white_check_mark:

torchvision-resnet50_1 1 767.18 767.54 -0.05% :white_check_mark:

cadene-dpn92_1 1 386.19 392.42 -1.59% :white_check_mark:

cadene-resnext101_1 1 392.00 393.79 -0.45% :white_check_mark:

onnx-taau-downsample 1 395.74 395.77 -0.01% :white_check_mark:

dlrm-criteoterabyte 1 33.76 33.77 -0.03% :white_check_mark:

dlrm-criteoterabyte_fp16 1 51.24 51.25 -0.02% :white_check_mark:

agentmodel 1 8,334.03 8,931.04 -6.68% :red_circle:

unet_fp16 2 59.14 59.14 0.01% :white_check_mark:

resnet50v1_fp16 1 980.63 978.22 0.25% :white_check_mark:

resnet50v1_int8 1 1,030.81 1,025.85 0.48% :white_check_mark:

bert_base_cased_fp16 64 1,107.47 1,107.56 -0.01% :white_check_mark:

bert_large_uncased_fp16 32 345.29 345.42 -0.04% :white_check_mark:

bert_large_fp16 1 197.48 196.96 0.26% :white_check_mark:

distilgpt2_fp16 16 2,117.26 2,118.48 -0.06% :white_check_mark:

yolov5s 1 566.10 570.35 -0.75% :white_check_mark:

tinyllama 1 43.95 43.98 -0.07% :white_check_mark:

vicuna-fastchat 1 45.27 45.38 -0.22% :white_check_mark:

whisper-tiny-encoder 1 417.62 417.79 -0.04% :white_check_mark:

whisper-tiny-decoder 1 400.51 409.99 -2.31% :white_check_mark:

llama2_7b 1 19.16 19.16 0.00% :white_check_mark:

qwen1.5-7b 1 23.53 23.54 -0.02% :white_check_mark:

phi3-3.8b 1 26.70 26.67 0.10% :white_check_mark:

mask-rcnn 1 12.51 12.44 0.57% :white_check_mark:

llama3-8b 1 21.72 21.73 -0.05% :white_check_mark:

whisper-large-encoder 1 10.22 10.22 -0.01% :white_check_mark:

whisper-large-decoder 1 96.60 96.35 0.26% :white_check_mark:

mistral-7b 1 23.73 23.74 -0.05% :white_check_mark:

FLUX.1-schnell 1 738.85 742.55 -0.50% :white_check_mark:

nan nan nan nan nan% :x:

Test	Batch	Rate new 918283	Rate old 018cae	Diff	Compare
torchvision-resnet50	64	3,247.40	3,248.22	-0.03%	:white_check_mark:
torchvision-resnet50_fp16	64	6,961.71	6,961.14	0.01%	:white_check_mark:
torchvision-densenet121	32	2,450.76	2,449.97	0.03%	:white_check_mark:
torchvision-densenet121_fp16	32	4,170.57	4,164.44	0.15%	:white_check_mark:
torchvision-inceptionv3	32	1,636.60	1,636.10	0.03%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,760.19	2,752.74	0.27%	:white_check_mark:
cadene-inceptionv4	16	771.57	771.22	0.04%	:white_check_mark:
cadene-resnext64x4	16	818.91	818.68	0.03%	:white_check_mark:
slim-mobilenet	64	7,460.56	7,459.63	0.01%	:white_check_mark:
slim-nasnetalarge	64	211.06	211.08	-0.01%	:white_check_mark:
slim-resnet50v2	64	3,344.50	3,342.04	0.07%	:white_check_mark:
bert-mrpc-onnx	8	1,145.14	1,145.42	-0.02%	:white_check_mark:
bert-mrpc-tf	1	445.65	442.53	0.70%	:white_check_mark:
pytorch-examples-wlang-gru	1	294.77	298.60	-1.28%	:white_check_mark:
pytorch-examples-wlang-lstm	1	404.81	411.85	-1.71%	:white_check_mark:
torchvision-resnet50_1	1	767.18	767.54	-0.05%	:white_check_mark:
cadene-dpn92_1	1	386.19	392.42	-1.59%	:white_check_mark:
cadene-resnext101_1	1	392.00	393.79	-0.45%	:white_check_mark:
onnx-taau-downsample	1	395.74	395.77	-0.01%	:white_check_mark:
dlrm-criteoterabyte	1	33.76	33.77	-0.03%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	51.24	51.25	-0.02%	:white_check_mark:
agentmodel	1	8,334.03	8,931.04	-6.68%	:red_circle:
unet_fp16	2	59.14	59.14	0.01%	:white_check_mark:
resnet50v1_fp16	1	980.63	978.22	0.25%	:white_check_mark:
resnet50v1_int8	1	1,030.81	1,025.85	0.48%	:white_check_mark:
bert_base_cased_fp16	64	1,107.47	1,107.56	-0.01%	:white_check_mark:
bert_large_uncased_fp16	32	345.29	345.42	-0.04%	:white_check_mark:
bert_large_fp16	1	197.48	196.96	0.26%	:white_check_mark:
distilgpt2_fp16	16	2,117.26	2,118.48	-0.06%	:white_check_mark:
yolov5s	1	566.10	570.35	-0.75%	:white_check_mark:
tinyllama	1	43.95	43.98	-0.07%	:white_check_mark:
vicuna-fastchat	1	45.27	45.38	-0.22%	:white_check_mark:
whisper-tiny-encoder	1	417.62	417.79	-0.04%	:white_check_mark:
whisper-tiny-decoder	1	400.51	409.99	-2.31%	:white_check_mark:
llama2_7b	1	19.16	19.16	0.00%	:white_check_mark:
qwen1.5-7b	1	23.53	23.54	-0.02%	:white_check_mark:
phi3-3.8b	1	26.70	26.67	0.10%	:white_check_mark:
mask-rcnn	1	12.51	12.44	0.57%	:white_check_mark:
llama3-8b	1	21.72	21.73	-0.05%	:white_check_mark:
whisper-large-encoder	1	10.22	10.22	-0.01%	:white_check_mark:
whisper-large-decoder	1	96.60	96.35	0.26%	:white_check_mark:
mistral-7b	1	23.73	23.74	-0.05%	:white_check_mark:
FLUX.1-schnell	1	738.85	742.55	-0.50%	:white_check_mark:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Aug 08 '25 22:08 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:x:bert-mrpc-tf: ERROR - check error output

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

2025-08-08 16:42:13.363812: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1754689338.796337 173517 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62951 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:b3:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1754689339.695586 173517 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-08-08 16:42:28.331802: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.331993: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332063: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332117: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332162: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332192: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332243: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-08-08 16:42:28.332293: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-08-08 16:42:28.333307: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-08-08 16:42:28.334401: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-08-08 16:42:28.334420: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-08-08 16:42:28.334431: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-08-08 16:42:28.334448: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 335, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:red_circle:unet: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:white_check_mark: llama2_7b: PASSED: MIGraphX meets tolerance

:white_check_mark: qwen1.5-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: phi3-3.8b: PASSED: MIGraphX meets tolerance

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Aug 08 '25 22:08 migraphx-bot

Set attribute to help bypass the warning about amdgpu_waves_per_eu