AMDMIGraphX issues

Support non-topologically sorted graphs

1

While most graphs we encounter are topologically sorted, and while ONNX IR spec specifies that to be the case, for compliance, ( see https://github.com/onnx/onnx/blob/main/docs/IR.md#graphs ), however, we do encounter non...

hgaspar

Add Parser for Attention Contrib OP

Need support for the Onnxruntime Attention Contrib op https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.Attention Onnxruntime optimizations will fuse operators into Attention to speed up inference runs This consists of the following changes. - [ ]...

TedThemistokleous

enhancement

onnxruntime

Onnx Operators

Perf Improve

Dangling quantizelinear from horizontal fusion, BERT and DistilGPT2

2

* Found during Inference Model Review meeting * Seen in bert_base_cased and distilgpt2_fp16 run with our `--fp8` flag and probably also `--int8` ``` @24 = gpu::code_object[code_object=8920,symbol_name=mlir_quantizelinear_quant_dot_dequantizelinear_add_add,global=1769472,local=256,](@18,@21,@23,@15,@22) -> half_type, {64, 384,...

CharlieL7

FP8

Perf Improve

INT8

Missing constant propagation: `Literal` -> `Multibroadcast` -> `Quantizelinear`

* Found during Inference Model Review meeting * Seen in bert_base_cased and distilgpt2_fp16 run with our `--fp8` flag and probably also `--int8` ``` @12 = hip::hip_copy_literal[id=main:@literal:17] -> half_type, {768, 2304},...

CharlieL7

FP8

Perf Improve

INT8

Analyze impact of warmup/tuning iterations on performance

HipBlasLt created a document on how to tune your own hipblas calls. https://github.com/ROCm/hipBLASLt/blob/develop/docs/how-to-use-hipblaslt-offline-tuning.rst. After checking with the hipBLASLt team, we figured our way of tuning hipBLASLt is similar to how...

ahsan-ca

Run deepseek/Janus_pro_7B in migraphx

turneram

Fuse quantizelinear for skip layers using multioutput fusions

1

Follow up from PR #3782 Ex resnet quantized graph after above PR: ``` NEW: q -> conv -> dq -> add -> relu -> q .......... -> q -> conv...

shivadbhavsar

Perf Improve

Skinny GEMM

Integrate with MLIR PR# https://github.com/ROCm/rocMLIR/pull/1706 when m,n,k are 1

causten

Perf Improve

Fail in find_inner_broadcast due to preserve_output_layout

Seen after #3587 Take a very simple program: ``` arg0 = @param:arg0 -> float_type, {2, 1, 4}, {4, 4, 1} @1 = multibroadcast[out_lens={1, 2, 3, 4},out_dyn_dims={}](arg0) -> float_type, {1, 2,...

shivadbhavsar

bug

BF16 fused_reduce compile fail

4

## Repro ``` # fuse_reduce.py import numpy as np import migraphx p = migraphx.program() m = p.get_main_module() s1 = migraphx.shape(type="float_type", lens=[1, 24, 4608, 128]) x0 = m.add_parameter("x0", s1) x1 =...

shivadbhavsar

bug

AMDMIGraphX
AMDMIGraphX copied to clipboard

Metadata

Support non-topologically sorted graphs

Add Parser for Attention Contrib OP

Dangling quantizelinear from horizontal fusion, BERT and DistilGPT2

Missing constant propagation: `Literal` -> `Multibroadcast` -> `Quantizelinear`

Analyze impact of warmup/tuning iterations on performance

Run deepseek/Janus_pro_7B in migraphx

Fuse quantizelinear for skip layers using multioutput fusions

Skinny GEMM

Fail in find_inner_broadcast due to preserve_output_layout

BF16 fused_reduce compile fail

← Metadata

Owner

Metadata

AMDMIGraphX AMDMIGraphX copied to clipboard

Metadata

← Metadata

Owner

Metadata

AMDMIGraphX
AMDMIGraphX copied to clipboard