Charlie Lin

Results 26 issues of Charlie Lin

* Implements the bitwise_and operator and ONNX parser. * Needed for support in TorchMIGraphX models

Onnx Operators

From the 22 Feb 2024 performance model review of Distilgpt2: what Paul had suggested but it can go further because pointwise is also used once. e.g. pointwise kernel @55 here...

Perf Improve
Tier1

From the 22 Feb 2024 performance model review of Distilgpt2: There are several gemms that are applied together(this is the tailend of attention): ``` @17 = hip::hip_copy_literal[id=main:@literal:6] -> half_type, {348,...

Perf Improve
Tier1

From the 22 Feb 2024 performance model review of Distilgpt2: Although it might be minor, we could fuse a pointwise with gather so we can get rid of the extra...

Perf Improve
Tier1

From the 22 Feb 2024 performance model review of Distilgpt2: There is a where before the softmax which prevents us from using flash attention: ``` @34 = gpu::code_object[code_object=9224,symbol_name=where_kernel,global=363312,local=1024,](@33,@30,@32) -> half_type,...

Perf Improve
Tier1

Comment out the qlinear_reused matcher because of an accuracy error for quantized resnet50

bugfix

* There's an accuracy error in resulting from the `qlinear_reused` matcher in `simplify_qdq`. * Note that the other half of the quantized resnet50 accuracy issue was from a disconnect between...

bugfix
Perf Improve

* During the migraphx graph optimizations introduction presentation I showed a situation where we could have used the distributive property of matrix multiplication to produce a more optimized graph then...

enhancement
Perf Improve

* With our changes to softmax we no longer use the log_softmax instruction that does the log and the softmax in one step. * We need to make a matcher...

Perf Improve