AMDMIGraphX
AMDMIGraphX copied to clipboard
Fuse `gather` and `pointwise`
From the 22 Feb 2024 performance model review of Distilgpt2:
Although it might be minor, we could fuse a pointwise with gather so we can get rid of the extra add_kernel:
@8 = gpu::code_object[code_object=9224,symbol_name=gather_kernel,global=225280,local=1024,](@6,@5,@7) -> half_type, {1, 348, 768}, {267264, 768, 1}
@9 = load[offset=0,end=534528](@1) -> half_type, {1, 348, 768}, {267264, 768, 1}
@10 = gpu::code_object[code_object=9088,symbol_name=add_kernel,global=133632,local=1024,](@8,@2,@9) -> half_type, {1, 348, 768}, {267264, 768, 1}
We would need to extend the gather kernel to also do the pointwise operation. Not sure if this fusion would lead to a performance improvement or possibly make the performance worse.
Delieverables:
- Investigate whether fusing pointwise kernels with gather would improve performance
- If we can improve performance with the fusion, implement the fusion