Add the heuristic of AddN op using reduce_sum op for parsing pb file (TF)
Motivation
Given a heuristic parsing solution for AddN op when trying to parse tf and support the concat op when having the mix shapes
Technical Details
Change the chain addition to reduce_sum op for parsing AddN op
If there is a mix of static and dynamic shapes, set everything to dynamic, then at the end, contract the shape back to static if possible. It also calculates the common non axis dims to bound the output. (Concat)
Test Plan
Add test cases in ref and tf/parse
- test/tf/tests/addn_test.cpp
- test/ref/add.cpp
Test Result
Submission Checklist
- [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
| Test | Batch | Rate new cd7e8a |
Rate old 397919 |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,160.32 | 3,245.86 | -2.64% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,595.68 | 6,951.81 | -5.12% | :red_circle: |
| torchvision-densenet121 | 32 | 2,434.48 | 2,449.22 | -0.60% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,100.40 | 4,167.34 | -1.61% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,665.16 | 1,635.29 | 1.83% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,582.21 | 2,759.38 | -6.42% | :red_circle: |
| cadene-inceptionv4 | 16 | 794.19 | 770.72 | 3.05% | :high_brightness: |
| cadene-resnext64x4 | 16 | 802.60 | 817.99 | -1.88% | :white_check_mark: |
| slim-mobilenet | 64 | 8,210.79 | 7,456.32 | 10.12% | :high_brightness: |
| slim-nasnetalarge | 64 | 221.72 | 210.95 | 5.11% | :high_brightness: |
| slim-resnet50v2 | 64 | 3,297.22 | 3,341.58 | -1.33% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,131.58 | 1,144.86 | -1.16% | :white_check_mark: |
| bert-mrpc-tf | 1 | 480.89 | 445.07 | 8.05% | :high_brightness: |
| pytorch-examples-wlang-gru | 1 | 297.10 | 299.79 | -0.90% | :white_check_mark: |
| pytorch-examples-wlang-lstm | 1 | 412.36 | 399.30 | 3.27% | :high_brightness: |
| torchvision-resnet50_1 | 1 | 798.59 | 761.18 | 4.92% | :high_brightness: |
| cadene-dpn92_1 | 1 | 411.51 | 384.27 | 7.09% | :high_brightness: |
| cadene-resnext101_1 | 1 | 360.78 | 391.94 | -7.95% | :red_circle: |
| onnx-taau-downsample | 1 | 396.77 | 395.58 | 0.30% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 31.90 | 33.78 | -5.55% | :red_circle: |
| dlrm-criteoterabyte_fp16 | 1 | 50.94 | 51.23 | -0.56% | :white_check_mark: |
| agentmodel | 1 | 8,718.02 | 9,034.65 | -3.50% | :red_circle: |
| unet_fp16 | 2 | 58.73 | 59.18 | -0.77% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 976.71 | 989.25 | -1.27% | :white_check_mark: |
| resnet50v1_int8 | 1 | 970.28 | 1,022.00 | -5.06% | :red_circle: |
| bert_base_cased_fp16 | 64 | 1,109.36 | 1,106.73 | 0.24% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 343.68 | 345.26 | -0.46% | :white_check_mark: |
| bert_large_fp16 | 1 | 197.32 | 197.13 | 0.09% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,096.42 | 2,115.80 | -0.92% | :white_check_mark: |
| yolov5s | 1 | 581.41 | 576.03 | 0.93% | :white_check_mark: |
| tinyllama | 1 | 43.75 | 43.97 | -0.49% | :white_check_mark: |
| vicuna-fastchat | 1 | 45.06 | 45.28 | -0.48% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 409.30 | 417.53 | -1.97% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 411.18 | 408.53 | 0.65% | :white_check_mark: |
| llama2_7b | 1 | 19.11 | 19.16 | -0.26% | :white_check_mark: |
| qwen1.5-7b | 1 | 23.44 | 23.51 | -0.32% | :white_check_mark: |
| phi3-3.8b | 1 | 26.53 | 26.67 | -0.53% | :white_check_mark: |
| mask-rcnn | 1 | 12.08 | 12.01 | 0.60% | :white_check_mark: |
| llama3-8b | 1 | 21.65 | 21.72 | -0.32% | :white_check_mark: |
| whisper-large-encoder | 1 | 10.16 | 10.21 | -0.48% | :white_check_mark: |
| whisper-large-decoder | 1 | 96.84 | 95.77 | 1.12% | :white_check_mark: |
| mistral-7b | 1 | 23.62 | 23.72 | -0.41% | :white_check_mark: |
| FLUX.1-schnell | 1 | 713.58 | 746.70 | -4.44% | :red_circle: |
| nan | nan | nan | nan | nan% | :x: |
This build is not recommended to merge :red_circle:
:x:bert-mrpc-tf: ERROR - check error output
error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]
error: unknown warning option '-Wnrvo' [-Werror,-Wunknown-warning-option]
2025-08-29 00:03:16.115752: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb'):red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output