AMDMIGraphX Add the heuristic of AddN op using reduce

Motivation

Given a heuristic parsing solution for AddN op when trying to parse tf and support the concat op when having the mix shapes

Technical Details

Change the chain addition to reduce_sum op for parsing AddN op

If there is a mix of static and dynamic shapes, set everything to dynamic, then at the end, contract the shape back to static if possible. It also calculates the common non axis dims to bound the output. (Concat)

Test Plan

Add test cases in ref and tf/parse

test/tf/tests/addn_test.cpp
test/ref/add.cpp

Test Result

Submission Checklist

[ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Aug 20 '25 19:08 kentqian

Test Batch Rate new
cd7e8a Rate old
397919 Diff Compare

torchvision-resnet50 64 3,160.32 3,245.86 -2.64% :white_check_mark:

torchvision-resnet50_fp16 64 6,595.68 6,951.81 -5.12% :red_circle:

torchvision-densenet121 32 2,434.48 2,449.22 -0.60% :white_check_mark:

torchvision-densenet121_fp16 32 4,100.40 4,167.34 -1.61% :white_check_mark:

torchvision-inceptionv3 32 1,665.16 1,635.29 1.83% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,582.21 2,759.38 -6.42% :red_circle:

cadene-inceptionv4 16 794.19 770.72 3.05% :high_brightness:

cadene-resnext64x4 16 802.60 817.99 -1.88% :white_check_mark:

slim-mobilenet 64 8,210.79 7,456.32 10.12% :high_brightness:

slim-nasnetalarge 64 221.72 210.95 5.11% :high_brightness:

slim-resnet50v2 64 3,297.22 3,341.58 -1.33% :white_check_mark:

bert-mrpc-onnx 8 1,131.58 1,144.86 -1.16% :white_check_mark:

bert-mrpc-tf 1 480.89 445.07 8.05% :high_brightness:

pytorch-examples-wlang-gru 1 297.10 299.79 -0.90% :white_check_mark:

pytorch-examples-wlang-lstm 1 412.36 399.30 3.27% :high_brightness:

torchvision-resnet50_1 1 798.59 761.18 4.92% :high_brightness:

cadene-dpn92_1 1 411.51 384.27 7.09% :high_brightness:

cadene-resnext101_1 1 360.78 391.94 -7.95% :red_circle:

onnx-taau-downsample 1 396.77 395.58 0.30% :white_check_mark:

dlrm-criteoterabyte 1 31.90 33.78 -5.55% :red_circle:

dlrm-criteoterabyte_fp16 1 50.94 51.23 -0.56% :white_check_mark:

agentmodel 1 8,718.02 9,034.65 -3.50% :red_circle:

unet_fp16 2 58.73 59.18 -0.77% :white_check_mark:

resnet50v1_fp16 1 976.71 989.25 -1.27% :white_check_mark:

resnet50v1_int8 1 970.28 1,022.00 -5.06% :red_circle:

bert_base_cased_fp16 64 1,109.36 1,106.73 0.24% :white_check_mark:

bert_large_uncased_fp16 32 343.68 345.26 -0.46% :white_check_mark:

bert_large_fp16 1 197.32 197.13 0.09% :white_check_mark:

distilgpt2_fp16 16 2,096.42 2,115.80 -0.92% :white_check_mark:

yolov5s 1 581.41 576.03 0.93% :white_check_mark:

tinyllama 1 43.75 43.97 -0.49% :white_check_mark:

vicuna-fastchat 1 45.06 45.28 -0.48% :white_check_mark:

whisper-tiny-encoder 1 409.30 417.53 -1.97% :white_check_mark:

whisper-tiny-decoder 1 411.18 408.53 0.65% :white_check_mark:

llama2_7b 1 19.11 19.16 -0.26% :white_check_mark:

qwen1.5-7b 1 23.44 23.51 -0.32% :white_check_mark:

phi3-3.8b 1 26.53 26.67 -0.53% :white_check_mark:

mask-rcnn 1 12.08 12.01 0.60% :white_check_mark:

llama3-8b 1 21.65 21.72 -0.32% :white_check_mark:

whisper-large-encoder 1 10.16 10.21 -0.48% :white_check_mark:

whisper-large-decoder 1 96.84 95.77 1.12% :white_check_mark:

mistral-7b 1 23.62 23.72 -0.41% :white_check_mark:

FLUX.1-schnell 1 713.58 746.70 -4.44% :red_circle:

nan nan nan nan nan% :x:

Test	Batch	Rate new cd7e8a	Rate old 397919	Diff	Compare
torchvision-resnet50	64	3,160.32	3,245.86	-2.64%	:white_check_mark:
torchvision-resnet50_fp16	64	6,595.68	6,951.81	-5.12%	:red_circle:
torchvision-densenet121	32	2,434.48	2,449.22	-0.60%	:white_check_mark:
torchvision-densenet121_fp16	32	4,100.40	4,167.34	-1.61%	:white_check_mark:
torchvision-inceptionv3	32	1,665.16	1,635.29	1.83%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,582.21	2,759.38	-6.42%	:red_circle:
cadene-inceptionv4	16	794.19	770.72	3.05%	:high_brightness:
cadene-resnext64x4	16	802.60	817.99	-1.88%	:white_check_mark:
slim-mobilenet	64	8,210.79	7,456.32	10.12%	:high_brightness:
slim-nasnetalarge	64	221.72	210.95	5.11%	:high_brightness:
slim-resnet50v2	64	3,297.22	3,341.58	-1.33%	:white_check_mark:
bert-mrpc-onnx	8	1,131.58	1,144.86	-1.16%	:white_check_mark:
bert-mrpc-tf	1	480.89	445.07	8.05%	:high_brightness:
pytorch-examples-wlang-gru	1	297.10	299.79	-0.90%	:white_check_mark:
pytorch-examples-wlang-lstm	1	412.36	399.30	3.27%	:high_brightness:
torchvision-resnet50_1	1	798.59	761.18	4.92%	:high_brightness:
cadene-dpn92_1	1	411.51	384.27	7.09%	:high_brightness:
cadene-resnext101_1	1	360.78	391.94	-7.95%	:red_circle:
onnx-taau-downsample	1	396.77	395.58	0.30%	:white_check_mark:
dlrm-criteoterabyte	1	31.90	33.78	-5.55%	:red_circle:
dlrm-criteoterabyte_fp16	1	50.94	51.23	-0.56%	:white_check_mark:
agentmodel	1	8,718.02	9,034.65	-3.50%	:red_circle:
unet_fp16	2	58.73	59.18	-0.77%	:white_check_mark:
resnet50v1_fp16	1	976.71	989.25	-1.27%	:white_check_mark:
resnet50v1_int8	1	970.28	1,022.00	-5.06%	:red_circle:
bert_base_cased_fp16	64	1,109.36	1,106.73	0.24%	:white_check_mark:
bert_large_uncased_fp16	32	343.68	345.26	-0.46%	:white_check_mark:
bert_large_fp16	1	197.32	197.13	0.09%	:white_check_mark:
distilgpt2_fp16	16	2,096.42	2,115.80	-0.92%	:white_check_mark:
yolov5s	1	581.41	576.03	0.93%	:white_check_mark:
tinyllama	1	43.75	43.97	-0.49%	:white_check_mark:
vicuna-fastchat	1	45.06	45.28	-0.48%	:white_check_mark:
whisper-tiny-encoder	1	409.30	417.53	-1.97%	:white_check_mark:
whisper-tiny-decoder	1	411.18	408.53	0.65%	:white_check_mark:
llama2_7b	1	19.11	19.16	-0.26%	:white_check_mark:
qwen1.5-7b	1	23.44	23.51	-0.32%	:white_check_mark:
phi3-3.8b	1	26.53	26.67	-0.53%	:white_check_mark:
mask-rcnn	1	12.08	12.01	0.60%	:white_check_mark:
llama3-8b	1	21.65	21.72	-0.32%	:white_check_mark:
whisper-large-encoder	1	10.16	10.21	-0.48%	:white_check_mark:
whisper-large-decoder	1	96.84	95.77	1.12%	:white_check_mark:
mistral-7b	1	23.62	23.72	-0.41%	:white_check_mark:
FLUX.1-schnell	1	713.58	746.70	-4.44%	:red_circle:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Aug 29 '25 06:08 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:x:bert-mrpc-tf: ERROR - check error output

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]

2025-08-29 00:03:16.115752: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:white_check_mark: llama2_7b: PASSED: MIGraphX meets tolerance

:white_check_mark: qwen1.5-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: phi3-3.8b: PASSED: MIGraphX meets tolerance

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Aug 29 '25 06:08 migraphx-bot

Add the heuristic of AddN op using reduce_sum op for parsing pb file (TF)

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist