tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Unity][Dlight] Enabling Fast and Efficient Kernel Generation by leveraging Hardware information

Open LeiWang1999 opened this issue 1 year ago • 0 comments
trafficstars

This pull request serves as an enhancement to Dlight. This update primarily focuses on incorporating hardware information to recommend tile candidates, which enables fast tuning.

Below is a brief summary of the major changes:

  • Introduce dl.ApplyFastTuning Pass

  • new flag for skip_simplify schedule::reindex : This addition can help avoid over-optimization of eliminating unit loops. This flag functions similarly to preserve_unit_loop.

  • improve compute_inline.cc to enhance the simplification on some complex inline case (e.g. layout transform)

  • simple bug fixes:

    • #16406
    • #16437

Related discussion: https://discuss.tvm.apache.org/t/dlight-enabling-fast-and-efficient-kernel-generation-by-leveraging-hardware-information/16273

TODO Items of this pull request:

  • [ ] provide related testing.
  • [ ] code style may need guidance (e.g. remove or replace the log print; refactor some python components to cpp)
  • [x] leverage structural equal cache to avoid duplicated tuning.
  • [x] implement mma schedule template with swizzling.
  • [x] support dynamic symbolic tuning
  • [ ] bring it to mlc-llm (maybe should improve our design to support dynamic symbolic).

LeiWang1999 avatar Jan 20 '24 18:01 LeiWang1999