AMDMIGraphX wait() failing for the default stream 0

This PR fixes these two issues when using the default stream on the GPU: i.e. via MIGRAPHX_ENABLE_NULL_STREAM=1:

Trying to verify on a model, (resnet50v2 here) gives this exception: context.hpp:365: get_elapsed_ms: Failed hipEventElapsedTime: device not ready
Trying to run perf on a model with the default stream gives more optimistic results, as the gpu-sync-up is essentially bypassed, incorrectly.

Mar 07 '25 19:03 lakhinderwalia

/AzurePipelines run

Mar 07 '25 22:03 jayhawk-commits

Azure Pipelines successfully started running 1 pipeline(s).

Mar 07 '25 22:03 azure-pipelines[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3873   +/-   ##
========================================
  Coverage    92.04%   92.04%           
========================================
  Files          531      531           
  Lines        24526    24526           
========================================
  Hits         22573    22573           
  Misses        1953     1953

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mar 08 '25 03:03 codecov[bot]

Since this relies on undocumented behavior, a unit test needs to be added.

https://rocm.docs.amd.com/projects/HIP/en/docs-develop/doxygen/html/group___stream.html#gabbfb9f573a6ebe8c478605ecb5504a74

This is not an undocumented behavior.

Jun 09 '25 23:06 lakhinderwalia

Test Batch Rate new
2c990d Rate old
090b5b Diff Compare

torchvision-resnet50 64 3,240.05 3,236.02 0.12% :white_check_mark:

torchvision-resnet50_fp16 64 6,886.10 6,878.18 0.12% :white_check_mark:

torchvision-densenet121 32 2,441.46 2,442.41 -0.04% :white_check_mark:

torchvision-densenet121_fp16 32 4,182.04 4,186.75 -0.11% :white_check_mark:

torchvision-inceptionv3 32 1,620.49 1,618.68 0.11% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,708.44 2,707.37 0.04% :white_check_mark:

cadene-inceptionv4 16 755.45 755.87 -0.06% :white_check_mark:

cadene-resnext64x4 16 814.14 814.28 -0.02% :white_check_mark:

slim-mobilenet 64 7,438.34 7,434.38 0.05% :white_check_mark:

slim-nasnetalarge 64 208.67 208.54 0.06% :white_check_mark:

slim-resnet50v2 64 3,334.91 3,330.30 0.14% :white_check_mark:

bert-mrpc-onnx 8 1,141.85 1,140.97 0.08% :white_check_mark:

bert-mrpc-tf 1 460.03 459.23 0.17% :white_check_mark:

pytorch-examples-wlang-gru 1 345.08 343.39 0.49% :white_check_mark:

pytorch-examples-wlang-lstm 1 473.25 472.19 0.23% :white_check_mark:

torchvision-resnet50_1 1 799.39 792.95 0.81% :white_check_mark:

cadene-dpn92_1 1 414.31 412.79 0.37% :white_check_mark:

cadene-resnext101_1 1 392.20 392.73 -0.14% :white_check_mark:

onnx-taau-downsample 1 394.56 395.29 -0.18% :white_check_mark:

dlrm-criteoterabyte 1 32.21 32.20 0.01% :white_check_mark:

dlrm-criteoterabyte_fp16 1 51.27 51.30 -0.05% :white_check_mark:

agentmodel 1 10,166.97 10,374.99 -2.01% :white_check_mark:

unet_fp16 2 59.40 59.43 -0.05% :white_check_mark:

resnet50v1_fp16 1 1,042.28 1,040.55 0.17% :white_check_mark:

resnet50v1_int8 1 1,063.18 1,073.64 -0.97% :white_check_mark:

bert_base_cased_fp16 64 1,170.99 1,170.80 0.02% :white_check_mark:

bert_large_uncased_fp16 32 356.47 356.49 -0.00% :white_check_mark:

bert_large_fp16 1 201.98 199.69 1.15% :white_check_mark:

distilgpt2_fp16 16 2,224.43 2,229.36 -0.22% :white_check_mark:

yolov5s 1 545.58 541.91 0.68% :white_check_mark:

tinyllama 1 43.65 43.69 -0.09% :white_check_mark:

vicuna-fastchat 1 44.79 44.89 -0.22% :white_check_mark:

whisper-tiny-encoder 1 418.12 417.80 0.08% :white_check_mark:

whisper-tiny-decoder 1 401.84 409.80 -1.94% :white_check_mark:

llama2_7b 1 19.07 19.04 0.19% :white_check_mark:

qwen1.5-7b 1 23.42 23.43 -0.06% :white_check_mark:

phi3-3.8b 1 26.54 26.55 -0.03% :white_check_mark:

mask-rcnn 1 12.68 12.72 -0.35% :white_check_mark:

llama3-8b 1 21.61 21.65 -0.19% :white_check_mark:

whisper-large-encoder 1 10.18 10.18 -0.02% :white_check_mark:

whisper-large-decoder 1 101.29 101.24 0.04% :white_check_mark:

mistral-7b 1 23.68 23.68 -0.01% :white_check_mark:

FLUX.1-schnell 1 770.83 771.09 -0.03% :white_check_mark:

nan nan nan nan nan% :x:

Test	Batch	Rate new 2c990d	Rate old 090b5b	Diff	Compare
torchvision-resnet50	64	3,240.05	3,236.02	0.12%	:white_check_mark:
torchvision-resnet50_fp16	64	6,886.10	6,878.18	0.12%	:white_check_mark:
torchvision-densenet121	32	2,441.46	2,442.41	-0.04%	:white_check_mark:
torchvision-densenet121_fp16	32	4,182.04	4,186.75	-0.11%	:white_check_mark:
torchvision-inceptionv3	32	1,620.49	1,618.68	0.11%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,708.44	2,707.37	0.04%	:white_check_mark:
cadene-inceptionv4	16	755.45	755.87	-0.06%	:white_check_mark:
cadene-resnext64x4	16	814.14	814.28	-0.02%	:white_check_mark:
slim-mobilenet	64	7,438.34	7,434.38	0.05%	:white_check_mark:
slim-nasnetalarge	64	208.67	208.54	0.06%	:white_check_mark:
slim-resnet50v2	64	3,334.91	3,330.30	0.14%	:white_check_mark:
bert-mrpc-onnx	8	1,141.85	1,140.97	0.08%	:white_check_mark:
bert-mrpc-tf	1	460.03	459.23	0.17%	:white_check_mark:
pytorch-examples-wlang-gru	1	345.08	343.39	0.49%	:white_check_mark:
pytorch-examples-wlang-lstm	1	473.25	472.19	0.23%	:white_check_mark:
torchvision-resnet50_1	1	799.39	792.95	0.81%	:white_check_mark:
cadene-dpn92_1	1	414.31	412.79	0.37%	:white_check_mark:
cadene-resnext101_1	1	392.20	392.73	-0.14%	:white_check_mark:
onnx-taau-downsample	1	394.56	395.29	-0.18%	:white_check_mark:
dlrm-criteoterabyte	1	32.21	32.20	0.01%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	51.27	51.30	-0.05%	:white_check_mark:
agentmodel	1	10,166.97	10,374.99	-2.01%	:white_check_mark:
unet_fp16	2	59.40	59.43	-0.05%	:white_check_mark:
resnet50v1_fp16	1	1,042.28	1,040.55	0.17%	:white_check_mark:
resnet50v1_int8	1	1,063.18	1,073.64	-0.97%	:white_check_mark:
bert_base_cased_fp16	64	1,170.99	1,170.80	0.02%	:white_check_mark:
bert_large_uncased_fp16	32	356.47	356.49	-0.00%	:white_check_mark:
bert_large_fp16	1	201.98	199.69	1.15%	:white_check_mark:
distilgpt2_fp16	16	2,224.43	2,229.36	-0.22%	:white_check_mark:
yolov5s	1	545.58	541.91	0.68%	:white_check_mark:
tinyllama	1	43.65	43.69	-0.09%	:white_check_mark:
vicuna-fastchat	1	44.79	44.89	-0.22%	:white_check_mark:
whisper-tiny-encoder	1	418.12	417.80	0.08%	:white_check_mark:
whisper-tiny-decoder	1	401.84	409.80	-1.94%	:white_check_mark:
llama2_7b	1	19.07	19.04	0.19%	:white_check_mark:
qwen1.5-7b	1	23.42	23.43	-0.06%	:white_check_mark:
phi3-3.8b	1	26.54	26.55	-0.03%	:white_check_mark:
mask-rcnn	1	12.68	12.72	-0.35%	:white_check_mark:
llama3-8b	1	21.61	21.65	-0.19%	:white_check_mark:
whisper-large-encoder	1	10.18	10.18	-0.02%	:white_check_mark:
whisper-large-decoder	1	101.29	101.24	0.04%	:white_check_mark:
mistral-7b	1	23.68	23.68	-0.01%	:white_check_mark:
FLUX.1-schnell	1	770.83	771.09	-0.03%	:white_check_mark:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Jun 10 '25 06:06 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:x:bert-mrpc-tf: ERROR - check error output

2025-06-10 00:28:55.048454: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1749533340.438932 182934 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62973 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:32:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1749533341.302674 182934 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-06-10 00:29:09.621814: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.621985: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.622024: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.622068: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.622110: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.622150: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.622188: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-06-10 00:29:09.622228: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-06-10 00:29:09.623293: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-06-10 00:29:09.624389: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-06-10 00:29:09.624410: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-06-10 00:29:09.624421: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-06-10 00:29:09.624437: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 324, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:red_circle:unet: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:white_check_mark: llama2_7b: PASSED: MIGraphX meets tolerance

:white_check_mark: qwen1.5-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: phi3-3.8b: PASSED: MIGraphX meets tolerance

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Jun 10 '25 06:06 migraphx-bot

AMDMIGraphX AMDMIGraphX copied to clipboard

wait() failing for the default stream 0

Codecov Report

AMDMIGraphX
AMDMIGraphX copied to clipboard