AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Fuse `gather` and `pointwise`

Open CharlieL7 opened this issue 1 year ago • 0 comments

From the 22 Feb 2024 performance model review of Distilgpt2:

Although it might be minor, we could fuse a pointwise with gather so we can get rid of the extra add_kernel:

@8 = gpu::code_object[code_object=9224,symbol_name=gather_kernel,global=225280,local=1024,](@6,@5,@7) -> half_type, {1, 348, 768}, {267264, 768, 1}
@9 = load[offset=0,end=534528](@1) -> half_type, {1, 348, 768}, {267264, 768, 1}
@10 = gpu::code_object[code_object=9088,symbol_name=add_kernel,global=133632,local=1024,](@8,@2,@9) -> half_type, {1, 348, 768}, {267264, 768, 1}

We would need to extend the gather kernel to also do the pointwise operation. Not sure if this fusion would lead to a performance improvement or possibly make the performance worse.

Delieverables:

  • Investigate whether fusing pointwise kernels with gather would improve performance
  • If we can improve performance with the fusion, implement the fusion

CharlieL7 avatar Feb 22 '24 18:02 CharlieL7