[Roadmap] Release Plan of tilelang 0.2.0
Release Plan for v0.2.0
- Release Manager: TBD
- Code Freeze Date: TBD
- Test Verification/Bug Bash:
- Release Date:
- Release Note:
Features(P0)
- [x] explicit warp specialize
- [ ] tile scheduler
- [x] support transposeB=False for Rocm
- [x] Correctness Evaluation
- [x] Layout Swizzling
Kernels (P0)
-
[x] Implement Flash MLA kernel
- [x] init version
- [x] optimize to SoTA
- [x] MI300
-
[x] Implement NSA kernel
- [x] init version
- [x] decoding
- [x] varlen
- [x] fuse topk
- [x] bwd
- [x] MI300
-
[x] Implement Flash seerAttention
- [x] init version
- [x] different q/kv seq
- [x] varlen
- [x] bwd
-
[x] optimize TileLang Flash Attention kernel to SoTA
- [x] H100
- [x] MI300
-
[ ] Complete support for commonly used attributes in Flash Attention
- [x] varlen
- [ ] mask/bias
- [ ] list all supported dims (benchmark)
- [ ] fa3 dim 256 fwd + bwd
- [ ] fa3 bwd (64, 128)
Backends #56
- [x] Pass and Migrate CI to H100
- [ ] fix fp16xfp4 dequant: testing/python/kernel/test_tilelang_kernel_dequantize_gemm.py: test_simple_impl_float16xfp4_gemm
- [ ] fix tma load for float32: testing/python/kernel/test_tilelang_kernel_gemm.py:test_gemm_f32f32f32_nn
- [x] Add support for WebGPU
- [ ] Add support for Metal
- [ ] Add support for Hexagon
Kernels
- [ ] compare with deepGemm
- [ ] e2e example: kernel develop flow
- [ ] Support FP8/INT8
T.gemm - [ ] Add Examples to CI Test
- [ ] optimize TileLang Flash Attention kernel to SoTA on A100
Features
- [x] Nightly Build
- [ ] Update API: Replace all tilelang.lower into tilelang.compile in examples and tests.
- [ ] Reduce LLVM dependencies
- [ ] Provide prebuilt and PyPI packages for ROCm platforms
- [ ] Integrate TileLang with Torch Inductor
- [ ] Configure API access level to enable advanced features
Cost Model
- [x] Integrate Cost Model Carver into auto-tuning
Any roadmap for "Integrate TileLang with Torch Inductor"? That sounds like a big project.
Hi, any plan to support Huawei Ascend NPU?
Hi, any plan to support Huawei Ascend NPU?
Thanks for your attention, npu backend will be coming in a few days.
Any plans to support B200?
Exicted to know Hexagon on the list, I'm wondering whether reverse engineering is needed to make this happen. HMX interface is still unreachable.