Charlie Lin
Charlie Lin
* Implements the bitwise_and operator and ONNX parser. * Needed for support in TorchMIGraphX models
From the 22 Feb 2024 performance model review of Distilgpt2: what Paul had suggested but it can go further because pointwise is also used once. e.g. pointwise kernel @55 here...
From the 22 Feb 2024 performance model review of Distilgpt2: There are several gemms that are applied together(this is the tailend of attention): ``` @17 = hip::hip_copy_literal[id=main:@literal:6] -> half_type, {348,...
From the 22 Feb 2024 performance model review of Distilgpt2: Although it might be minor, we could fuse a pointwise with gather so we can get rid of the extra...
From the 22 Feb 2024 performance model review of Distilgpt2: There is a where before the softmax which prevents us from using flash attention: ``` @34 = gpu::code_object[code_object=9224,symbol_name=where_kernel,global=363312,local=1024,](@33,@30,@32) -> half_type,...
Comment out the qlinear_reused matcher because of an accuracy error for quantized resnet50
* There's an accuracy error in resulting from the `qlinear_reused` matcher in `simplify_qdq`. * Note that the other half of the quantized resnet50 accuracy issue was from a disconnect between...
* During the migraphx graph optimizations introduction presentation I showed a situation where we could have used the distributive property of matrix multiplication to produce a more optimized graph then...
* With our changes to softmax we no longer use the log_softmax instruction that does the log and the softmax in one step. * We need to make a matcher...