tvm [Unity][Dlight] Enabling Fast and Efficient Kernel Generation by leveraging Hardware information

[Unity][Dlight] Enabling Fast and Efficient Kernel Generation by leveraging Hardware information

Open LeiWang1999 opened this issue 1 year ago • 0 comments

trafficstars

This pull request serves as an enhancement to Dlight. This update primarily focuses on incorporating hardware information to recommend tile candidates, which enables fast tuning.

Below is a brief summary of the major changes:

Introduce dl.ApplyFastTuning Pass
new flag for skip_simplify schedule::reindex : This addition can help avoid over-optimization of eliminating unit loops. This flag functions similarly to preserve_unit_loop.
improve compute_inline.cc to enhance the simplification on some complex inline case (e.g. layout transform)
simple bug fixes:
- #16406
- #16437

Related discussion: https://discuss.tvm.apache.org/t/dlight-enabling-fast-and-efficient-kernel-generation-by-leveraging-hardware-information/16273

TODO Items of this pull request:

[ ] provide related testing.
[ ] code style may need guidance (e.g. remove or replace the log print; refactor some python components to cpp)
[x] leverage structural equal cache to avoid duplicated tuning.
[x] implement mma schedule template with swizzling.
[x] support dynamic symbolic tuning
[ ] bring it to mlc-llm (maybe should improve our design to support dynamic symbolic).

Jan 20 '24 18:01 LeiWang1999

tvm tvm copied to clipboard

[Unity][Dlight] Enabling Fast and Efficient Kernel Generation by leveraging Hardware information

tvm
tvm copied to clipboard