AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

AMD's graph optimization engine.

Results 433 AMDMIGraphX issues
Sort by recently updated
recently updated
newest added

To improve performance for transpose kernels we should load the transposed inputs into LDS directly, and then read from LDS instead. We have function like `preload_copy` which will do this...

Perf Improve
Tier1

Example series of instructions found in Longformer: ``` @145 = hip::copy(@121,@144) -> half_type, {4, 4, 256, 513}, {525312, 131328, 513, 1}: 0.0193204ms, 1% @146 = load[offset=8404992,end=12607488](@1) -> half_type, {4, 4,...

Perf Improve

Fuse average pooling with convolution ``` @77 = gpu::code_object[code_object=9344,symbol_name=pad_kernel,global=262848,local=1024,](@57,@76) -> float_type, {1, 192, 37, 37}, {262848, 1369, 37, 1} @78 = load[offset=705600,end=1646400](@1) -> float_type, {1, 192, 35, 35}, {235200, 1225,...

Perf Improve

From the 22 Feb 2024 performance model review of Distilgpt2: what Paul had suggested but it can go further because pointwise is also used once. e.g. pointwise kernel @55 here...

Perf Improve
Tier1

From the 22 Feb 2024 performance model review of Distilgpt2: There are several gemms that are applied together(this is the tailend of attention): ``` @17 = hip::hip_copy_literal[id=main:@literal:6] -> half_type, {348,...

Perf Improve
Tier1

From the 22 Feb 2024 performance model review of Distilgpt2: Although it might be minor, we could fuse a pointwise with gather so we can get rid of the extra...

Perf Improve
Tier1

From the 22 Feb 2024 performance model review of Distilgpt2: There is a where before the softmax which prevents us from using flash attention: ``` @34 = gpu::code_object[code_object=9224,symbol_name=where_kernel,global=363312,local=1024,](@33,@30,@32) -> half_type,...

Perf Improve
Tier1

Add additional flags to the MIGraphX Driver perf to allow for different timing methodologies to match how we run a model through onnxruntime. Handling things this way allows us to...

onnxruntime
dependencies

See discussions here : https://github.com/ROCm/AMDMIGraphX/pull/3299#issuecomment-2246075234