tilelang icon indicating copy to clipboard operation
tilelang copied to clipboard

[Roadmap] Release Plan of tilelang 0.2.0

Open LeiWang1999 opened this issue 10 months ago • 5 comments

Release Plan for v0.2.0

  • Release Manager: TBD
  • Code Freeze Date: TBD
  • Test Verification/Bug Bash:
  • Release Date:
  • Release Note:

Features(P0)

  • [x] explicit warp specialize
  • [ ] tile scheduler
  • [x] support transposeB=False for Rocm
    • [x] Correctness Evaluation
    • [x] Layout Swizzling

Kernels (P0)

  • [x] Implement Flash MLA kernel

    • [x] init version
    • [x] optimize to SoTA
    • [x] MI300
  • [x] Implement NSA kernel

    • [x] init version
    • [x] decoding
    • [x] varlen
    • [x] fuse topk
    • [x] bwd
    • [x] MI300
  • [x] Implement Flash seerAttention

    • [x] init version
    • [x] different q/kv seq
    • [x] varlen
    • [x] bwd
  • [x] optimize TileLang Flash Attention kernel to SoTA

    • [x] H100
    • [x] MI300
  • [ ] Complete support for commonly used attributes in Flash Attention

    • [x] varlen
    • [ ] mask/bias
    • [ ] list all supported dims (benchmark)
    • [ ] fa3 dim 256 fwd + bwd
    • [ ] fa3 bwd (64, 128)

Backends #56

  • [x] Pass and Migrate CI to H100
    • [ ] fix fp16xfp4 dequant: testing/python/kernel/test_tilelang_kernel_dequantize_gemm.py: test_simple_impl_float16xfp4_gemm
    • [ ] fix tma load for float32: testing/python/kernel/test_tilelang_kernel_gemm.py:test_gemm_f32f32f32_nn
  • [x] Add support for WebGPU
  • [ ] Add support for Metal
  • [ ] Add support for Hexagon

Kernels

  • [ ] compare with deepGemm
  • [ ] e2e example: kernel develop flow
  • [ ] Support FP8/INT8 T.gemm
  • [ ] Add Examples to CI Test
  • [ ] optimize TileLang Flash Attention kernel to SoTA on A100

Features

  • [x] Nightly Build
  • [ ] Update API: Replace all tilelang.lower into tilelang.compile in examples and tests.
  • [ ] Reduce LLVM dependencies
  • [ ] Provide prebuilt and PyPI packages for ROCm platforms
  • [ ] Integrate TileLang with Torch Inductor
  • [ ] Configure API access level to enable advanced features

Cost Model

  • [x] Integrate Cost Model Carver into auto-tuning

LeiWang1999 avatar Feb 12 '25 07:02 LeiWang1999

Any roadmap for "Integrate TileLang with Torch Inductor"? That sounds like a big project.

ZelinMa557 avatar Aug 27 '25 03:08 ZelinMa557

Hi, any plan to support Huawei Ascend NPU?

yuanfang-chen avatar Sep 26 '25 08:09 yuanfang-chen

Hi, any plan to support Huawei Ascend NPU?

Thanks for your attention, npu backend will be coming in a few days.

LeiWang1999 avatar Sep 27 '25 05:09 LeiWang1999

Any plans to support B200?

Edenzzzz avatar Sep 29 '25 15:09 Edenzzzz

Exicted to know Hexagon on the list, I'm wondering whether reverse engineering is needed to make this happen. HMX interface is still unreachable.

zhaoxuejun1234 avatar Nov 12 '25 07:11 zhaoxuejun1234