tvm [RELAX][PASS] Annotate Custom Scope layout pass for Adreno GPU

Refer

https://discuss.tvm.apache.org/t/rfc-annotate-custom-scope-layout-relax-pass-for-adreno-gpu/18052/6

for details about texture scope handling.

Jan 21 '25 16:01 srkreddy1238

@tvm-bot rerun

Jan 23 '25 03:01 srkreddy1238

@Hzfengsy do you mind take a look given it touches FuseOps/TIR

Jan 25 '25 14:01 tqchen

also cc @yongwww for memory scope related changes

Jan 25 '25 14:01 tqchen

@tvm-bot rerun

Jan 28 '25 10:01 srkreddy1238

Thanks @srkreddy1238 for updates. I take a closer look and now understands the motivation behind add_attributes. This is mainly to handle the case of conv2d operators where texture can be supported.

However, attaching op attributes into the call_tir indeed introduce less desirable impact, as the specification of call_tir originally do not have to deal with these attributes, and having them will results in "leak through". This would increase the surface area for developers working with call_tir

I also now understand the demand is to enable the finally fused call_tir function to decide whether texture memory is feasible.

I think it is more cleaner to try a different approach. Instead of relying on legalize pass, let us introduce an adreno specific conv_dispatch which can be used before legalize, to offload these conv operators. We specifically attach the attribute tir.opencl_texture_2d_supported = true to the call node.

Now the remaining question is where the schedule can appear

The most clean way is to actually have relax.andreno.conv_dispatch to call the dlight schedule and construct such call_tir, and mark it as already scheduled. The only issue is that in such case followup FuseOps/TIR should treat this as opaque, and do not yet have capabilities to run more fusions. But we should be fine getting the right conv2d op scheduled

To further enable fusion, one can try adopt the following customized legalize sequence - S0: relax.andreno.conv_dispatch: run conv dispatch and mark it as opaque with tir.opencl_texture_2d_supported = true - S1: Run legalize and analysis - S2: Do a pattern match to manually fuse the ewise onto the conv2d (by creating a sub function that calls into conv2d then ewise), this will create a sub function that calls into conv2d and then ewise, which can then be consumed by FuseTIR - Run FuseOps (this will try to fuse the other ops) - Run FuseTIR - Run dlight

Feb 03 '25 18:02 tqchen

Off late realized, I could have drafted an RFC to describe the approach. Have done now https://discuss.tvm.apache.org/t/rfc-annotate-custom-scope-layout-relax-pass-for-adreno-gpu/18052

@tqchen thanks for the thoughts. Few concerns I have in this approach

tir.opencl_texture_2d_supported = true : I assume this flag will be used to realize VDevice in struct_info after FuseTIR. Then, only flag may not be sufficient here we might need scope information for each input. And this information to be consistent while we pass through the fusion ops.
Another moderate challenge is in S2, where we need to define and maintain BYOC like pattern table to ensure maximum fusion possibilities.

Pls advice.

Feb 07 '25 05:02 srkreddy1238

Updated the RFC @ https://discuss.tvm.apache.org/t/rfc-annotate-custom-scope-layout-relax-pass-for-adreno-gpu/18052/6

Feb 26 '25 17:02 srkreddy1238

@tvm-bot rerun

Mar 10 '25 08:03 srkreddy1238

@tqchen can you take a look at this ?

Mar 11 '25 11:03 srkreddy1238

@tvm-bot rerun

Mar 26 '25 13:03 srkreddy1238

@tqchen please handle this before any conflicts. Corresponding PR's for TIR lowering and runtime are waiting on this.

Apr 15 '25 02:04 srkreddy1238

@tqchen

One catch I came across in the current implementation is representing return type sinfo_args for ops like conv2d in tvmscript.

lv2: R.Tensor((2, 8, 54, 54, 4), dtype="float32", vdevice="opencl:1:global") = R.nn.conv2d(lv, lv1, strides=[1, 1], padding=[0, 0, 0, 0], dilation=[1, 1], groups=1, data_layout="NCHW4c", kernel_layout="OIHW4o", out_layout="NCHW4c", out_dtype="float32", sinfo_args=(R.Tensor((2, 8, 54, 54, 4), dtype="float32", vdevice="opencl:1:global"),))

Any advice ?

Apr 23 '25 15:04 srkreddy1238

sorry has been busy with some stuffs, should be able to do a pass around may

Apr 24 '25 00:04 tqchen

Sorry for delaying get back because of the recent focus on FFI refactoring, I think the main remaining item is the following change

StructInfo infered_sinfo = infer_struct_info_map[op](GetRef<Call>(call), builder_); legalized = UpdateVDeviceOutStructInfo(legalized, visited_call, infered_sinfo);

Where the struct info inference function was called during legalization (and I commented on that previously which might be overlooked). The primary principle we had is to automating struct info deduction during building, so in theory every pre-visit call already have struct info populated, and most rewrite should not change the struct info post visit. Doing explicit deduction within a rewrite a is a pattern we want to avoid

Jun 02 '25 12:06 tqchen

Sorry for delaying get back because of the recent focus on FFI refactoring, I think the main remaining item is the following change

StructInfo infered_sinfo = infer_struct_info_map[op](GetRef(call), builder_); legalized = UpdateVDeviceOutStructInfo(legalized, visited_call, infered_sinfo);

Where the struct info inference function was called during legalization (and I commented on that previously which might be overlooked). The primary principle we had is to automating struct info deduction during building, so in theory every pre-visit call already have struct info populated, and most rewrite should not change the struct info post visit. Doing explicit deduction within a rewrite a is a pattern we want to avoid

Understood and the challenge lies in the legalization utils defined below which sets the output_sinfo (aka Vdevice info) derived from any one of its arguments instead of inferred from original operator infer API. I will improve this part ...

https://github.com/apache/tvm/blob/437d00afcd5496700703427cc4ce316a35d35baa/python/tvm/relax/utils.py#L324

Jun 24 '25 10:06 srkreddy1238

@tqchen we are good now with recommended changes. Pls take a look

Jun 25 '25 16:06 srkreddy1238

@tqchen rebased and up for review.

Nov 04 '25 13:11 srkreddy1238