tvm [DietCode] Local Padding

This PR is for the code generation changes required for dynamic MetaScheduler (see apache/tvm-rfcs#72 for the RFC, #11516 for the tracking issue describing the changes). Any feedback or comments are welcome.

FYI, @comaniac @junrushao1994

Jun 21 '22 01:06 ArmageddonKnight

Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi

Jun 21 '22 01:06 comaniac

Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items:

The local padding pass will be moved to TIR transformation, meaning that local padding becomes an implicit transformation similar to loop partitioning. A config will be exposed to control whether to turn on or off (default off) to keep all current workloads unchanged.
In the local padding implementation, the logic related to var node name hints will be improved to leverage a more reliable factor (e.g., pointer reference).

Jun 23 '22 18:06 comaniac

@junrushao1994 @Hzfengsy I have finished the revision. Please have a second look when you have time.

Also cc @comaniac

Jun 26 '22 01:06 ArmageddonKnight

It seems that for some reason the CI build is stopped (as I am unable to query the current CI status), would it be possible to re-trigger the CI.

Jun 26 '22 23:06 ArmageddonKnight

@tvm-bot rerun

Jun 27 '22 21:06 ArmageddonKnight

hi, @ArmageddonKnight it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

def save_kernel_source(kernel, log_kernel_filename): kernel_src=kernel.imported_modules[0].get_source() if log_kernel_filename is not None: with open(log_kernel_filename, 'w') as fout: fout.write("{}".format(kernel_src)) else: print({}.format(kernel_src))

@tvm.testing.requires_gpu @tvm.testing.requires_cuda def test_dense_local_padding(): """ Test that local padding is delivering the correct compute outcome. """ x_np = np.random.uniform(-0.1, 0.1, size=(960, 770)).astype(np.float32) w_np = np.random.uniform(-0.1, 0.1, size=(770, 2304)).astype(np.float32) y_np = np.matmul(x_np, w_np) y_empty = np.empty(shape=y_np.shape, dtype=y_np.dtype) tir_sched = Schedule(Dense_960x770x2304) sample_dense_sched(tir_sched) with tvm.transform.PassContext(config={"tir.enable_local_pad": False}): nopad_cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda") save_kernel_source(nopad_cuda_kernel, "nolocalpad_kernel.cu") with tvm.transform.PassContext(config={"tir.enable_local_pad": True}): cuda_kernel = tvm.build(tir_sched.mod["main"], [], target="cuda") save_kernel_source(cuda_kernel, "localpad_kernel.cu")

 cuda_ctx = tvm.cuda()
 module_data = [x_np, w_np, y_empty]
 module_data = [tvm.nd.array(d, device=cuda_ctx) for d in module_data]
 cuda_kernel(*module_data)
 np.testing.assert_allclose(module_data[-1].numpy(), y_np, atol=1e-3, rtol=1e-3)

the localpad_kernel.cu are same with nolocalpad_kernel.cu

Jul 29 '22 04:07 renfeier

@renfeier The reason is ebcause we are refactoring the implementation, so the pass itself is temporarily commented out. Sorry I was quite busy with university business and will finish the refactoring recently.

Jul 29 '22 20:07 ArmageddonKnight

refactoring @ArmageddonKnight Thank you for the prompt reply. Looking forward to your update

Jul 31 '22 07:07 renfeier

@junrushao1994 As was discussed, I have fixed the implementation. Please review it again.

Aug 06 '22 00:08 ArmageddonKnight

Hmm ... seems that the Cortex CI pipelines are always interrupted for some reason, and this is happening on the main branch as well.

Aug 08 '22 15:08 ArmageddonKnight

@junrushao1994 The refactored implementation has passed the CI tests. Please review it when you have time available. Thanks.

Aug 09 '22 02:08 ArmageddonKnight

Hi @junrushao , it has been sometime since this PR is submitted. May I know whether there are any updates on this? And whether further changes are required?

Aug 30 '22 16:08 ArmageddonKnight

@ArmageddonKnight @junrushao What is the status of this PR or DietCode upstreaming in general? I'm interested in dynamic shape tuning, and I can help this effort.

Dec 12 '22 03:12 masahi

This looks similar to https://github.com/apache/tvm/pull/12750, maybe we don't need this? cc @vinx13

Dec 13 '22 23:12 masahi

@masahi PadEinsum can achieve something similar since the padding is in the shared memory

Dec 13 '22 23:12 vinx13

tvm tvm copied to clipboard

[DietCode] Local Padding

hi, @ArmageddonKnight it seems the tvm transform config "tir.enable_local_pad "does not work since the same schedule build result kernel src code are the same when set config tir.enable.lcal_pad true/false, when i use the test example you upload before, example code will show belows:

tvm
tvm copied to clipboard