AMDMIGraphX
AMDMIGraphX copied to clipboard
AMD's graph optimization engine.
While most graphs we encounter are topologically sorted, and while ONNX IR spec specifies that to be the case, for compliance, ( see https://github.com/onnx/onnx/blob/main/docs/IR.md#graphs ), however, we do encounter non...
Need support for the Onnxruntime Attention Contrib op https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.Attention Onnxruntime optimizations will fuse operators into Attention to speed up inference runs This consists of the following changes. - [ ]...
* Found during Inference Model Review meeting * Seen in bert_base_cased and distilgpt2_fp16 run with our `--fp8` flag and probably also `--int8` ``` @24 = gpu::code_object[code_object=8920,symbol_name=mlir_quantizelinear_quant_dot_dequantizelinear_add_add,global=1769472,local=256,](@18,@21,@23,@15,@22) -> half_type, {64, 384,...
* Found during Inference Model Review meeting * Seen in bert_base_cased and distilgpt2_fp16 run with our `--fp8` flag and probably also `--int8` ``` @12 = hip::hip_copy_literal[id=main:@literal:17] -> half_type, {768, 2304},...
HipBlasLt created a document on how to tune your own hipblas calls. https://github.com/ROCm/hipBLASLt/blob/develop/docs/how-to-use-hipblaslt-offline-tuning.rst. After checking with the hipBLASLt team, we figured our way of tuning hipBLASLt is similar to how...
Follow up from PR #3782 Ex resnet quantized graph after above PR: ``` NEW: q -> conv -> dq -> add -> relu -> q .......... -> q -> conv...
Integrate with MLIR PR# https://github.com/ROCm/rocMLIR/pull/1706 when m,n,k are 1
Seen after #3587 Take a very simple program: ``` arg0 = @param:arg0 -> float_type, {2, 1, 4}, {4, 4, 1} @1 = multibroadcast[out_lens={1, 2, 3, 4},out_dyn_dims={}](arg0) -> float_type, {1, 2,...
## Repro ``` # fuse_reduce.py import numpy as np import migraphx p = migraphx.program() m = p.get_main_module() s1 = migraphx.shape(type="float_type", lens=[1, 24, 4608, 128]) x0 = m.add_parameter("x0", s1) x1 =...