AMDMIGraphX
AMDMIGraphX copied to clipboard
SparseAttention ONNX Contrib Op Implementation
| Test | Batch | Rate new 2ed947 |
Rate old 8177ed |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,175.23 | 3,156.64 | 0.59% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,610.72 | 6,585.90 | 0.38% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,444.37 | 2,434.16 | 0.42% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,114.20 | 4,100.96 | 0.32% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,672.64 | 1,664.47 | 0.49% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,596.43 | 2,579.29 | 0.66% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 797.69 | 794.64 | 0.38% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 807.08 | 802.37 | 0.59% | :white_check_mark: |
| slim-mobilenet | 64 | 8,237.03 | 8,205.30 | 0.39% | :white_check_mark: |
| slim-nasnetalarge | 64 | 222.79 | 221.58 | 0.55% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,308.52 | 3,295.13 | 0.41% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,143.12 | 1,131.65 | 1.01% | :white_check_mark: |
| bert-mrpc-tf | 1 | 479.43 | 478.53 | 0.19% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 295.97 | 294.77 | 0.41% | :white_check_mark: |
| pytorch-examples-wlang-lstm | 1 | 405.78 | 409.45 | -0.90% | :white_check_mark: |
| torchvision-resnet50_1 | 1 | 793.98 | 800.17 | -0.77% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 413.65 | 411.44 | 0.54% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 369.96 | 368.48 | 0.40% | :white_check_mark: |
| onnx-taau-downsample | 1 | 398.54 | 397.45 | 0.27% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 32.04 | 31.90 | 0.45% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 51.02 | 50.96 | 0.12% | :white_check_mark: |
| agentmodel | 1 | 9,366.63 | 9,103.57 | 2.89% | :white_check_mark: |
| unet_fp16 | 2 | 58.93 | 58.78 | 0.27% | :white_check_mark: |
| resnet50v1_fp16 | 1 | 963.57 | 951.81 | 1.24% | :white_check_mark: |
| resnet50v1_int8 | 1 | 968.24 | 969.07 | -0.09% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,114.37 | 1,109.23 | 0.46% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 345.55 | 343.63 | 0.56% | :white_check_mark: |
| bert_large_fp16 | 1 | 196.66 | 196.18 | 0.24% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,106.52 | 2,093.09 | 0.64% | :white_check_mark: |
| yolov5s | 1 | 580.47 | 580.29 | 0.03% | :white_check_mark: |
| tinyllama | 1 | 43.95 | 43.78 | 0.39% | :white_check_mark: |
| vicuna-fastchat | 1 | 45.26 | 45.11 | 0.34% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 411.37 | 409.17 | 0.54% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 412.82 | 411.02 | 0.44% | :white_check_mark: |
| llama2_7b | 1 | 19.17 | 19.11 | 0.30% | :white_check_mark: |
| qwen1.5-7b | 1 | 23.51 | 23.42 | 0.42% | :white_check_mark: |
| phi3-3.8b | 1 | 26.67 | 26.58 | 0.35% | :white_check_mark: |
| mask-rcnn | 1 | 11.93 | 11.96 | -0.23% | :white_check_mark: |
| llama3-8b | 1 | 21.74 | 21.67 | 0.29% | :white_check_mark: |
| whisper-large-encoder | 1 | 10.22 | 10.17 | 0.51% | :white_check_mark: |
| whisper-large-decoder | 1 | 96.57 | 95.77 | 0.83% | :white_check_mark: |
| mistral-7b | 1 | 23.73 | 23.63 | 0.40% | :white_check_mark: |
| FLUX.1-schnell | 1 | 708.46 | 702.58 | 0.84% | :white_check_mark: |
| nan | nan | nan | nan | nan% | :x: |
This build is not recommended to merge :red_circle:
:x:bert-mrpc-tf: ERROR - check error output
2025-09-03 10:20:56.197188: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 359, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 306, in main
graph = load_tf_graph(model_name)
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 300, in load_tf_graph
graph_def.ParseFromString(f.read())
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 116, in read
self._preread_check()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/new-saved-models/tf-misc/bert_mrpc1.pb'):red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output