AMDMIGraphX Fuse convolution + reshape + transpose + sigmoid

@404 = gpu::code_object[code_object=6464,symbol_name=mlir_convolution_add,global=102400,local=256,](@402,@293,@400,@403) -> half_type, {1, 255, 80, 80}, {1632000, 6400, 80, 1}, target_id=0: 0.0226192ms, 1%
@405 = reshape_lazy[dims={1, 3, 85, 80, 80}](@404) -> half_type, {1, 3, 85, 80, 80}, {1632000, 544000, 6400, 80, 1}, target_id=0: 0.00101232ms, 1%
@406 = transpose[permutation={0, 1, 3, 4, 2}](@405) -> half_type, {1, 3, 80, 80, 85}, {1632000, 544000, 80, 1, 6400}, target_id=0: 0.00064004ms, 1%
@407 = load[offset=4233600,end=7497600](@1) -> half_type, {1, 3, 80, 80, 85}, {1632000, 544000, 80, 1, 6400}, target_id=0: 0.00062424ms, 1%
@408 = gpu::code_object[code_object=9096,symbol_name=sigmoid_kernel,global=816000,local=1024,](@406,@407) -> half_type, {1, 3, 80, 80, 85}, {1632000, 544000, 80, 1, 6400}, target_id=0: 0.0194867ms, 1%

This pattern is from YOLOv5s model.

In this case convolution + add + reshape + transpose sigmoid can be fused but they are not being fused currently.

fuse_mlir pass already does the fusion for conv + pointwise. https://github.com/ROCm/AMDMIGraphX/blob/6e505cd43c5286f4fafff9b8b863fe94e8735a25/src/targets/gpu/fuse_mlir.cpp#L408

But fuse_pointwise pass can not fuse pointwise modules across the transposes and therefore fuse_mlir is just fusing the convolution + add.

https://github.com/ROCm/AMDMIGraphX/blob/6e505cd43c5286f4fafff9b8b863fe94e8735a25/src/fuse_pointwise.cpp#L181

https://github.com/ROCm/AMDMIGraphX/blob/6e505cd43c5286f4fafff9b8b863fe94e8735a25/src/fuse_pointwise.cpp#L202

This issue can be same as https://github.com/ROCm/AMDMIGraphX/issues/2822 or https://github.com/ROCm/AMDMIGraphX/issues/2813

Feb 29 '24 20:02 umangyadav

@pfultz2 Do you think, this needs to be handled inside pointwise fusion instead of fuse_mlir pass ?

Feb 29 '24 21:02 umangyadav

https://github.com/ROCm/AMDMIGraphX/pull/3280 Doesn't solve this because "transpose" is between two pointwise ops. fuse_pointwise wouldn't fuse two pointwises across transposes.

Would it be possible somehow to fuse pointwises across transposes ? else i find it better to run fuse_mlir pass first and then fuse_pointwise_reduce

Jul 26 '24 13:07 umangyadav

Would it be possible somehow to fuse pointwises across transposes ?

Yes, we need to extend the rewrite_reshapes to handle that by updating the axis map with the new permutation.

Jul 26 '24 13:07 pfultz2

Okay. Is following is the tracking issue ? https://github.com/ROCm/AMDMIGraphX/issues/2895

Jul 26 '24 14:07 umangyadav