Fuse multiple outputs for pointwise ops
Codecov Report
Attention: Patch coverage is 97.15640% with 6 lines in your changes missing coverage. Please review.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/fuse_concat.cpp | 88.24% | 2 Missing :warning: |
| src/fuse_pointwise.cpp | 98.28% | 2 Missing :warning: |
| src/fuse_pointwise_reduce.cpp | 0.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #3870 +/- ##
===========================================
+ Coverage 92.13% 92.18% +0.04%
===========================================
Files 528 528
Lines 24179 24307 +128
===========================================
+ Hits 22277 22405 +128
Misses 1902 1902
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/include/migraphx/fuse_pointwise.hpp | 100.00% <ø> (ø) |
|
| src/include/migraphx/instruction.hpp | 100.00% <ø> (ø) |
|
| src/include/migraphx/matcher.hpp | 96.05% <100.00%> (+0.03%) |
:arrow_up: |
| src/include/migraphx/module.hpp | 100.00% <ø> (ø) |
|
| src/include/migraphx/shape.hpp | 93.02% <ø> (ø) |
|
| src/instruction.cpp | 88.46% <100.00%> (+0.32%) |
:arrow_up: |
| src/module.cpp | 86.68% <100.00%> (+0.33%) |
:arrow_up: |
| src/param_utils.cpp | 97.14% <100.00%> (+0.59%) |
:arrow_up: |
| src/replace_allocate.cpp | 100.00% <100.00%> (ø) |
|
| src/shape.cpp | 92.23% <100.00%> (+0.02%) |
:arrow_up: |
| ... and 3 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
/AzurePipelines run
Azure Pipelines successfully started running 1 pipeline(s).
| Test | Batch | Rate new 964d42 |
Rate old fa3c63 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,236.72 | 3,254.91 | -0.56% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,899.95 | 6,935.32 | -0.51% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,445.49 | 2,455.93 | -0.42% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,217.16 | 4,225.55 | -0.20% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,617.28 | 1,626.96 | -0.59% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,703.96 | 2,719.91 | -0.59% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 756.28 | 761.21 | -0.65% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 814.00 | 819.12 | -0.62% | :white_check_mark: |
| slim-mobilenet | 64 | 7,424.86 | 7,473.64 | -0.65% | :white_check_mark: |
| slim-nasnetalarge | 64 | 215.85 | 217.85 | -0.92% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,332.73 | 3,461.67 | -3.72% | :red_circle: |
| bert-mrpc-onnx | 8 | 1,145.06 | 1,153.83 | -0.76% | :white_check_mark: |
| bert-mrpc-tf | 1 | 458.52 | 466.76 | -1.77% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 346.02 | 496.78 | -30.35% | :red_circle: |
| pytorch-examples-wlang-lstm | 1 | 480.20 | 447.88 | 7.22% | :high_brightness: |
| torchvision-resnet50_1 | 1 | 817.15 | 815.85 | 0.16% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 434.00 | 426.64 | 1.73% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 392.03 | 393.64 | -0.41% | :white_check_mark: |
| onnx-taau-downsample | 1 | 394.81 | 395.64 | -0.21% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 32.18 | 32.35 | -0.52% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 51.19 | 51.29 | -0.20% | :white_check_mark: |
| agentmodel | 1 | 10,539.43 | 10,384.82 | 1.49% | :white_check_mark: |
| unet_fp16 | 2 | 59.38 | 59.58 | -0.33% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 1,091.21 | 1,082.55 | 0.80% | :white_check_mark: |
| resnet50v1_int8 | 1 | 1,066.89 | 1,060.66 | 0.59% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,162.07 | 1,170.69 | -0.74% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 356.04 | 357.93 | -0.53% | :white_check_mark: |
| bert_large_fp16 | 1 | 199.70 | 200.81 | -0.55% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,225.58 | 2,238.86 | -0.59% | :white_check_mark: |
| yolov5s | 1 | 537.00 | 543.73 | -1.24% | :white_check_mark: |
| tinyllama | 1 | 43.64 | 43.88 | -0.56% | :white_check_mark: |
| vicuna-fastchat | 1 | 44.76 | 45.03 | -0.59% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 419.40 | 421.18 | -0.42% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 410.88 | 412.83 | -0.47% | :white_check_mark: |
| llama2_7b | 1 | nan | nan | nan% | :x: |
| qwen1.5-7b | 1 | 23.45 | 23.54 | -0.38% | :white_check_mark: |
| phi3-3.8b | 1 | nan | nan | nan% | :x: |
| mask-rcnn | 1 | 21.40 | 22.12 | -3.26% | :red_circle: |
| llama3-8b | 1 | 21.67 | 21.74 | -0.30% | :white_check_mark: |
| whisper-large-encoder | 1 | 10.17 | 10.22 | -0.48% | :white_check_mark: |
| whisper-large-decoder | 1 | 99.91 | 99.81 | 0.10% | :white_check_mark: |
| mistral-7b | 1 | 23.67 | 23.74 | -0.33% | :white_check_mark: |
| FLUX.1-schnell | 1 | 921.79 | 910.85 | 1.20% | :white_check_mark: |
| nan | nan | nan | nan | nan% | :x: |
This build is not recommended to merge :red_circle:
:x:bert-mrpc-tf: ERROR - check error output
2025-04-23 15:24:24.082336: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745439869.653186 162945 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62973 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:32:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745439870.545578 162945 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-04-23 15:24:39.493475: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493529: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493756: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493795: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493824: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493866: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493907: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-23 15:24:39.493949: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-04-23 15:24:39.495237: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-04-23 15:24:39.496567: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-04-23 15:24:39.496589: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-04-23 15:24:39.496602: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-04-23 15:24:39.496619: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 324, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'::red_circle:unet: FAILED: MIGraphX is not within tolerance - check verbose output
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:x:llama2_7b: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:265: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx:x:qwen1.5-7b: ERROR - check error output
usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256:x:phi3-3.8b: ERROR - check error output
usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output
:x:#whisper-large-encoder: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/convolution.hpp:100: normalize_compute_shape: CONVOLUTION: mismatched channel numbers