AMDMIGraphX
AMDMIGraphX copied to clipboard
Missing constant propagation: `Literal` -> `Multibroadcast` -> `Quantizelinear`
- Found during Inference Model Review meeting
- Seen in bert_base_cased and distilgpt2_fp16 run with our
--fp8flag and probably also--int8
@12 = hip::hip_copy_literal[id=main:@literal:17] -> half_type, {768, 2304}, {2304, 1}
@13 = load[offset=188743680,end=190513152](@1) -> fp8e4m3fnuz_type, {64, 768, 2304}, {0, 2304, 1}
@14 = multibroadcast[out_lens={64, 768, 2304},out_dyn_dims={}](@12) -> half_type, {64, 768, 2304}, {0, 2304, 1}
@15 = gpu::code_object[code_object=5088,symbol_name=quantizelinear_kernel,global=113246208,local=1024,](@14,@13) -> fp8e4m3fnuz_type, {64, 768, 2304}, {0, 2304, 1}
- Example from distilgpt2_fp16
- driver command:
bin/driver perf /codes/distilgpt2_1_fp16_gpu.onnx --fp8 --fill1 input_ids --input-dim @input_ids 64 384 --batch 64
- driver command:
- The input to instruction
@15 quantizelinearis a broadcasted literal. The broadcast instruction should have been swapped by thefind_inner_broadcastsmatcher in thesimplify_algebracompiler pass to then allow thepropagate_constantpass to make it into a constant.