AMDMIGraphX Fuse multiple outputs for pointwise ops

Mar 06 '25 23:03 pfultz2

Codecov Report

Attention: Patch coverage is 97.15640% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/fuse_concat.cpp	88.24%	2 Missing :warning:
src/fuse_pointwise.cpp	98.28%	2 Missing :warning:
src/fuse_pointwise_reduce.cpp	0.00%	2 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3870      +/-   ##
===========================================
+ Coverage    92.13%   92.18%   +0.04%     
===========================================
  Files          528      528              
  Lines        24179    24307     +128     
===========================================
+ Hits         22277    22405     +128     
  Misses        1902     1902

Files with missing lines	Coverage Δ
src/include/migraphx/fuse_pointwise.hpp	`100.00% <ø> (ø)`
src/include/migraphx/instruction.hpp	`100.00% <ø> (ø)`
src/include/migraphx/matcher.hpp	`96.05% <100.00%> (+0.03%)`	:arrow_up:
src/include/migraphx/module.hpp	`100.00% <ø> (ø)`
src/include/migraphx/shape.hpp	`93.02% <ø> (ø)`
src/instruction.cpp	`88.46% <100.00%> (+0.32%)`	:arrow_up:
src/module.cpp	`86.68% <100.00%> (+0.33%)`	:arrow_up:
src/param_utils.cpp	`97.14% <100.00%> (+0.59%)`	:arrow_up:
src/replace_allocate.cpp	`100.00% <100.00%> (ø)`
src/shape.cpp	`92.23% <100.00%> (+0.02%)`	:arrow_up:
... and 3 more

... and 1 file with indirect coverage changes

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mar 07 '25 01:03 codecov[bot]

/AzurePipelines run

Mar 07 '25 22:03 jayhawk-commits

Azure Pipelines successfully started running 1 pipeline(s).

Mar 07 '25 22:03 azure-pipelines[bot]

Test Batch Rate new
964d42 Rate old
fa3c63 Diff Compare

torchvision-resnet50 64 3,236.72 3,254.91 -0.56% :white_check_mark:

torchvision-resnet50_fp16 64 6,899.95 6,935.32 -0.51% :white_check_mark:

torchvision-densenet121 32 2,445.49 2,455.93 -0.42% :white_check_mark:

torchvision-densenet121_fp16 32 4,217.16 4,225.55 -0.20% :white_check_mark:

torchvision-inceptionv3 32 1,617.28 1,626.96 -0.59% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,703.96 2,719.91 -0.59% :white_check_mark:

cadene-inceptionv4 16 756.28 761.21 -0.65% :white_check_mark:

cadene-resnext64x4 16 814.00 819.12 -0.62% :white_check_mark:

slim-mobilenet 64 7,424.86 7,473.64 -0.65% :white_check_mark:

slim-nasnetalarge 64 215.85 217.85 -0.92% :white_check_mark:

slim-resnet50v2 64 3,332.73 3,461.67 -3.72% :red_circle:

bert-mrpc-onnx 8 1,145.06 1,153.83 -0.76% :white_check_mark:

bert-mrpc-tf 1 458.52 466.76 -1.77% :white_check_mark:

pytorch-examples-wlang-gru 1 346.02 496.78 -30.35% :red_circle:

pytorch-examples-wlang-lstm 1 480.20 447.88 7.22% :high_brightness:

torchvision-resnet50_1 1 817.15 815.85 0.16% :white_check_mark:

cadene-dpn92_1 1 434.00 426.64 1.73% :white_check_mark:

cadene-resnext101_1 1 392.03 393.64 -0.41% :white_check_mark:

onnx-taau-downsample 1 394.81 395.64 -0.21% :white_check_mark:

dlrm-criteoterabyte 1 32.18 32.35 -0.52% :white_check_mark:

dlrm-criteoterabyte_fp16 1 51.19 51.29 -0.20% :white_check_mark:

agentmodel 1 10,539.43 10,384.82 1.49% :white_check_mark:

unet_fp16 2 59.38 59.58 -0.33% :white_check_mark:

resnet50v1_fp16 1 1,091.21 1,082.55 0.80% :white_check_mark:

resnet50v1_int8 1 1,066.89 1,060.66 0.59% :white_check_mark:

bert_base_cased_fp16 64 1,162.07 1,170.69 -0.74% :white_check_mark:

bert_large_uncased_fp16 32 356.04 357.93 -0.53% :white_check_mark:

bert_large_fp16 1 199.70 200.81 -0.55% :white_check_mark:

distilgpt2_fp16 16 2,225.58 2,238.86 -0.59% :white_check_mark:

yolov5s 1 537.00 543.73 -1.24% :white_check_mark:

tinyllama 1 43.64 43.88 -0.56% :white_check_mark:

vicuna-fastchat 1 44.76 45.03 -0.59% :white_check_mark:

whisper-tiny-encoder 1 419.40 421.18 -0.42% :white_check_mark:

whisper-tiny-decoder 1 410.88 412.83 -0.47% :white_check_mark:

llama2_7b 1 nan nan nan% :x:

qwen1.5-7b 1 23.45 23.54 -0.38% :white_check_mark:

phi3-3.8b 1 nan nan nan% :x:

mask-rcnn 1 21.40 22.12 -3.26% :red_circle:

llama3-8b 1 21.67 21.74 -0.30% :white_check_mark:

whisper-large-encoder 1 10.17 10.22 -0.48% :white_check_mark:

whisper-large-decoder 1 99.91 99.81 0.10% :white_check_mark:

mistral-7b 1 23.67 23.74 -0.33% :white_check_mark:

FLUX.1-schnell 1 921.79 910.85 1.20% :white_check_mark:

nan nan nan nan nan% :x:

Test	Batch	Rate new 964d42	Rate old fa3c63	Diff	Compare
torchvision-resnet50	64	3,236.72	3,254.91	-0.56%	:white_check_mark:
torchvision-resnet50_fp16	64	6,899.95	6,935.32	-0.51%	:white_check_mark:
torchvision-densenet121	32	2,445.49	2,455.93	-0.42%	:white_check_mark:
torchvision-densenet121_fp16	32	4,217.16	4,225.55	-0.20%	:white_check_mark:
torchvision-inceptionv3	32	1,617.28	1,626.96	-0.59%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,703.96	2,719.91	-0.59%	:white_check_mark:
cadene-inceptionv4	16	756.28	761.21	-0.65%	:white_check_mark:
cadene-resnext64x4	16	814.00	819.12	-0.62%	:white_check_mark:
slim-mobilenet	64	7,424.86	7,473.64	-0.65%	:white_check_mark:
slim-nasnetalarge	64	215.85	217.85	-0.92%	:white_check_mark:
slim-resnet50v2	64	3,332.73	3,461.67	-3.72%	:red_circle:
bert-mrpc-onnx	8	1,145.06	1,153.83	-0.76%	:white_check_mark:
bert-mrpc-tf	1	458.52	466.76	-1.77%	:white_check_mark:
pytorch-examples-wlang-gru	1	346.02	496.78	-30.35%	:red_circle:
pytorch-examples-wlang-lstm	1	480.20	447.88	7.22%	:high_brightness:
torchvision-resnet50_1	1	817.15	815.85	0.16%	:white_check_mark:
cadene-dpn92_1	1	434.00	426.64	1.73%	:white_check_mark:
cadene-resnext101_1	1	392.03	393.64	-0.41%	:white_check_mark:
onnx-taau-downsample	1	394.81	395.64	-0.21%	:white_check_mark:
dlrm-criteoterabyte	1	32.18	32.35	-0.52%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	51.19	51.29	-0.20%	:white_check_mark:
agentmodel	1	10,539.43	10,384.82	1.49%	:white_check_mark:
unet_fp16	2	59.38	59.58	-0.33%	:white_check_mark:
resnet50v1_fp16	1	1,091.21	1,082.55	0.80%	:white_check_mark:
resnet50v1_int8	1	1,066.89	1,060.66	0.59%	:white_check_mark:
bert_base_cased_fp16	64	1,162.07	1,170.69	-0.74%	:white_check_mark:
bert_large_uncased_fp16	32	356.04	357.93	-0.53%	:white_check_mark:
bert_large_fp16	1	199.70	200.81	-0.55%	:white_check_mark:
distilgpt2_fp16	16	2,225.58	2,238.86	-0.59%	:white_check_mark:
yolov5s	1	537.00	543.73	-1.24%	:white_check_mark:
tinyllama	1	43.64	43.88	-0.56%	:white_check_mark:
vicuna-fastchat	1	44.76	45.03	-0.59%	:white_check_mark:
whisper-tiny-encoder	1	419.40	421.18	-0.42%	:white_check_mark:
whisper-tiny-decoder	1	410.88	412.83	-0.47%	:white_check_mark:
llama2_7b	1	nan	nan	nan%	:x:
qwen1.5-7b	1	23.45	23.54	-0.38%	:white_check_mark:
phi3-3.8b	1	nan	nan	nan%	:x:
mask-rcnn	1	21.40	22.12	-3.26%	:red_circle:
llama3-8b	1	21.67	21.74	-0.30%	:white_check_mark:
whisper-large-encoder	1	10.17	10.22	-0.48%	:white_check_mark:
whisper-large-decoder	1	99.91	99.81	0.10%	:white_check_mark:
mistral-7b	1	23.67	23.74	-0.33%	:white_check_mark:
FLUX.1-schnell	1	921.79	910.85	1.20%	:white_check_mark:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Apr 23 '25 21:04 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:x:bert-mrpc-tf: ERROR - check error output

2025-04-23 15:24:24.082336: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745439869.653186 162945 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62973 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:32:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745439870.545578 162945 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-04-23 15:24:39.493475: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493529: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493756: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493795: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493824: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493866: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493907: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493949: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-04-23 15:24:39.495237: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-04-23 15:24:39.496567: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-04-23 15:24:39.496589: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-04-23 15:24:39.496602: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-04-23 15:24:39.496619: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 324, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:red_circle:unet: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:x:llama2_7b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:265: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx

:x:qwen1.5-7b: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256

:x:phi3-3.8b: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:x:#whisper-large-encoder: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/convolution.hpp:100: normalize_compute_shape: CONVOLUTION: mismatched channel numbers

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Apr 23 '25 21:04 migraphx-bot