AMDMIGraphX SparseAttention ONNX Contrib Op Implementation

Sep 03 '25 12:09 music-dino

Test Batch Rate new
2ed947 Rate old
8177ed Diff Compare

torchvision-resnet50 64 3,175.23 3,156.64 0.59% :white_check_mark:

torchvision-resnet50_fp16 64 6,610.72 6,585.90 0.38% :white_check_mark:

torchvision-densenet121 32 2,444.37 2,434.16 0.42% :white_check_mark:

torchvision-densenet121_fp16 32 4,114.20 4,100.96 0.32% :white_check_mark:

torchvision-inceptionv3 32 1,672.64 1,664.47 0.49% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,596.43 2,579.29 0.66% :white_check_mark:

cadene-inceptionv4 16 797.69 794.64 0.38% :white_check_mark:

cadene-resnext64x4 16 807.08 802.37 0.59% :white_check_mark:

slim-mobilenet 64 8,237.03 8,205.30 0.39% :white_check_mark:

slim-nasnetalarge 64 222.79 221.58 0.55% :white_check_mark:

slim-resnet50v2 64 3,308.52 3,295.13 0.41% :white_check_mark:

bert-mrpc-onnx 8 1,143.12 1,131.65 1.01% :white_check_mark:

bert-mrpc-tf 1 479.43 478.53 0.19% :white_check_mark:

pytorch-examples-wlang-gru 1 295.97 294.77 0.41% :white_check_mark:

pytorch-examples-wlang-lstm 1 405.78 409.45 -0.90% :white_check_mark:

torchvision-resnet50_1 1 793.98 800.17 -0.77% :white_check_mark:

cadene-dpn92_1 1 413.65 411.44 0.54% :white_check_mark:

cadene-resnext101_1 1 369.96 368.48 0.40% :white_check_mark:

onnx-taau-downsample 1 398.54 397.45 0.27% :white_check_mark:

dlrm-criteoterabyte 1 32.04 31.90 0.45% :white_check_mark:

dlrm-criteoterabyte_fp16 1 51.02 50.96 0.12% :white_check_mark:

agentmodel 1 9,366.63 9,103.57 2.89% :white_check_mark:

unet_fp16 2 58.93 58.78 0.27% :white_check_mark:

resnet50v1_fp16 1 963.57 951.81 1.24% :white_check_mark:

resnet50v1_int8 1 968.24 969.07 -0.09% :white_check_mark:

bert_base_cased_fp16 64 1,114.37 1,109.23 0.46% :white_check_mark:

bert_large_uncased_fp16 32 345.55 343.63 0.56% :white_check_mark:

bert_large_fp16 1 196.66 196.18 0.24% :white_check_mark:

distilgpt2_fp16 16 2,106.52 2,093.09 0.64% :white_check_mark:

yolov5s 1 580.47 580.29 0.03% :white_check_mark:

tinyllama 1 43.95 43.78 0.39% :white_check_mark:

vicuna-fastchat 1 45.26 45.11 0.34% :white_check_mark:

whisper-tiny-encoder 1 411.37 409.17 0.54% :white_check_mark:

whisper-tiny-decoder 1 412.82 411.02 0.44% :white_check_mark:

llama2_7b 1 19.17 19.11 0.30% :white_check_mark:

qwen1.5-7b 1 23.51 23.42 0.42% :white_check_mark:

phi3-3.8b 1 26.67 26.58 0.35% :white_check_mark:

mask-rcnn 1 11.93 11.96 -0.23% :white_check_mark:

llama3-8b 1 21.74 21.67 0.29% :white_check_mark:

whisper-large-encoder 1 10.22 10.17 0.51% :white_check_mark:

whisper-large-decoder 1 96.57 95.77 0.83% :white_check_mark:

mistral-7b 1 23.73 23.63 0.40% :white_check_mark:

FLUX.1-schnell 1 708.46 702.58 0.84% :white_check_mark:

nan nan nan nan nan% :x:

Test	Batch	Rate new 2ed947	Rate old 8177ed	Diff	Compare
torchvision-resnet50	64	3,175.23	3,156.64	0.59%	:white_check_mark:
torchvision-resnet50_fp16	64	6,610.72	6,585.90	0.38%	:white_check_mark:
torchvision-densenet121	32	2,444.37	2,434.16	0.42%	:white_check_mark:
torchvision-densenet121_fp16	32	4,114.20	4,100.96	0.32%	:white_check_mark:
torchvision-inceptionv3	32	1,672.64	1,664.47	0.49%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,596.43	2,579.29	0.66%	:white_check_mark:
cadene-inceptionv4	16	797.69	794.64	0.38%	:white_check_mark:
cadene-resnext64x4	16	807.08	802.37	0.59%	:white_check_mark:
slim-mobilenet	64	8,237.03	8,205.30	0.39%	:white_check_mark:
slim-nasnetalarge	64	222.79	221.58	0.55%	:white_check_mark:
slim-resnet50v2	64	3,308.52	3,295.13	0.41%	:white_check_mark:
bert-mrpc-onnx	8	1,143.12	1,131.65	1.01%	:white_check_mark:
bert-mrpc-tf	1	479.43	478.53	0.19%	:white_check_mark:
pytorch-examples-wlang-gru	1	295.97	294.77	0.41%	:white_check_mark:
pytorch-examples-wlang-lstm	1	405.78	409.45	-0.90%	:white_check_mark:
torchvision-resnet50_1	1	793.98	800.17	-0.77%	:white_check_mark:
cadene-dpn92_1	1	413.65	411.44	0.54%	:white_check_mark:
cadene-resnext101_1	1	369.96	368.48	0.40%	:white_check_mark:
onnx-taau-downsample	1	398.54	397.45	0.27%	:white_check_mark:
dlrm-criteoterabyte	1	32.04	31.90	0.45%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	51.02	50.96	0.12%	:white_check_mark:
agentmodel	1	9,366.63	9,103.57	2.89%	:white_check_mark:
unet_fp16	2	58.93	58.78	0.27%	:white_check_mark:
resnet50v1_fp16	1	963.57	951.81	1.24%	:white_check_mark:
resnet50v1_int8	1	968.24	969.07	-0.09%	:white_check_mark:
bert_base_cased_fp16	64	1,114.37	1,109.23	0.46%	:white_check_mark:
bert_large_uncased_fp16	32	345.55	343.63	0.56%	:white_check_mark:
bert_large_fp16	1	196.66	196.18	0.24%	:white_check_mark:
distilgpt2_fp16	16	2,106.52	2,093.09	0.64%	:white_check_mark:
yolov5s	1	580.47	580.29	0.03%	:white_check_mark:
tinyllama	1	43.95	43.78	0.39%	:white_check_mark:
vicuna-fastchat	1	45.26	45.11	0.34%	:white_check_mark:
whisper-tiny-encoder	1	411.37	409.17	0.54%	:white_check_mark:
whisper-tiny-decoder	1	412.82	411.02	0.44%	:white_check_mark:
llama2_7b	1	19.17	19.11	0.30%	:white_check_mark:
qwen1.5-7b	1	23.51	23.42	0.42%	:white_check_mark:
phi3-3.8b	1	26.67	26.58	0.35%	:white_check_mark:
mask-rcnn	1	11.93	11.96	-0.23%	:white_check_mark:
llama3-8b	1	21.74	21.67	0.29%	:white_check_mark:
whisper-large-encoder	1	10.22	10.17	0.51%	:white_check_mark:
whisper-large-decoder	1	96.57	95.77	0.83%	:white_check_mark:
mistral-7b	1	23.73	23.63	0.40%	:white_check_mark:
FLUX.1-schnell	1	708.46	702.58	0.84%	:white_check_mark:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Sep 03 '25 16:09 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:x:bert-mrpc-tf: ERROR - check error output

2025-09-03 10:20:56.197188: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb')

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:white_check_mark: llama2_7b: PASSED: MIGraphX meets tolerance

:white_check_mark: qwen1.5-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: phi3-3.8b: PASSED: MIGraphX meets tolerance

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Sep 03 '25 16:09 migraphx-bot

AMDMIGraphX AMDMIGraphX copied to clipboard

SparseAttention ONNX Contrib Op Implementation

AMDMIGraphX
AMDMIGraphX copied to clipboard