Roadmap:

[ ] Clear TODOs

SM8x

[x] bf16/fp16
[x] customized metadata layout
[x] tf32
- [ ] precision issue due to using fp32 as tf32
[x] int8
[x] fp8
[ ] different scopes
- [ ] sss
- [ ] srs
- [ ] rss
- [ ] rrs
- [ ] metadata in register (?)

Misc

[x] provide compression utils example
[ ] perf optimization
- [ ] metadata ldsm
[x] Doc

Oct 17 '25 09:10 botbw

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

Oct 17 '25 09:10 github-actions[bot]

Walkthrough

Adds a new gemm_sp_v2 TileLang API and TL operator (GemmSPPy and GemmSPWarpPolicy), a SparseTensorCore intrinsics emitter and many SP layout helpers, refactors metadata layout API to make_cutlass_metadata_layout, updates examples/benchmarks/tests to use the new API, and exposes related utilities and tests.

Changes

Cohort / File(s)	Summary
FFI / C++ op & reflection `src/op/gemm_sp.cc`, `src/op/gemm_sp.h`	Register `tl.GemmSPWarpPolicy`, add reflection and FFI entry `GemmSPWarpPolicyComputeWarpPartition`; add reflection registration function.
C++ TileOp wiring `src/op/gemm_sp_py.cc`, `src/op/gemm_sp_py.h`	New `GemmSPPy` TileOperator: arg deserialization, Clone, GetGemmInst/CheckWGMMA, Lower/InferLayout, TL op registration and static reflection init.
TileLang public API `tilelang/language/experimental/gemm_sp.py`, `tilelang/language/__init__.py`	Add and export `gemm_sp_v2` (argument legalization, strides/offsets, dispatch to `tl.gemm_sp_py`).
TileOp layer (Python) `tilelang/tileop/gemm_sp/*`, `tilelang/tileop/__init__.py`	New `GemmSPPy` export; add `GemmSPBase` (accessors) and `GemmSPMMA` (infer_layout and lower for ss/sr/rs/rr).
Sparse emitter & layouts `tilelang/intrinsics/mma_sp_macro_generator.py`, `tilelang/intrinsics/mma_sp_layout.py`, `tilelang/intrinsics/mma_layout.py`, `tilelang/intrinsics/mma_macro_generator.py`	Add `SparseTensorCoreIntrinEmitter` (ldmatrix/loads/mma_sp/stmatrix), many SP layout helpers, and 32x8→16x16 load layouts for non-ldmatrix fallbacks.
Layout API refactor `tilelang/layout/gemm_sp.py`, `tilelang/layout/__init__.py`	Replace `make_metadata_layout` with `make_cutlass_metadata_layout`, remove backend arg, add SM90/SM8x creators and arch dispatch; update callers.
TileLang IR / policy `tilelang/ir.py`	Add `GemmSPWarpPolicy` object and `compute_warp_partition(..., bits)` delegating to FFI.
TileOp typing `tilelang/tileop/gemm/__init__.py`	Add type hints to `gemm_py_infer_layout` and `gemm_py_lower` signatures.
Utils: tensor & sparse `tilelang/utils/tensor.py`, `tilelang/utils/sparse.py`	Add `is_float8`, `fp8_remove_negative_zeros_`, extend TensorSupplyType; add `randint_semi_sparse` and dtype-aware compress/randn behavior.
Templates / Debug / Common `src/tl_templates/cuda/debug.h`, `src/tl_templates/cuda/common.h`	Add `debug_print_buffer_value<uint16_t>` specialization and new `make_int4(short...)` overload.
Benchmarks / Examples `benchmark/matmul/benchmark_matmul_sp.py`, `examples/gemm_sp/*`, `examples/sparse_tensorcore/tilelang_example_sparse_tensorcore.py`, `examples/gemm_sp/example_custom_compress.py`	Update matmul_sp signature to accept `in_dtype` and call `T.gemm_sp_v2`; switch to `make_cutlass_metadata_layout`; add example custom compressor and config constants; adjust imports and CLI defaults.
Tests `examples/gemm_sp/test_example_gemm_sp.py`, `testing/python/tilelibrary/test_tilelang_tilelibrary_gemm_sp.py`, `testing/python/tilelibrary/test_tilelang_tilelibrary_gemm_sp_v2.py`	Add example tests; add/refactor comprehensive gemm_sp and gemm_sp_v2 tests with dtype-aware input generators and dense-reference comparisons.
Docs / Index `docs/deeplearning_operators/matmul_sparse.md`, `docs/index.md`	Add new matmul_sparse documentation and register it in docs index.
Profiler exports `tilelang/profiler/__init__.py`	Import and re-export `is_float8` from `tilelang.utils.tensor` (replace local impl).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TileLang as gemm_sp_v2
    participant TL_Op as GemmSPPy
    participant TileOp as GemmSPMMA
    participant Emitter as SparseTensorCoreIntrinEmitter

    User->>TileLang: call gemm_sp_v2(A_sparse, E, B, C, ...)
    TileLang->>TL_Op: construct GemmSPPy node (buffers, args)
    TL_Op->>TileOp: infer_layout(target, thread_nums)
    TileOp->>Emitter: build emitter for pattern (ss/sr/rs/rr)
    Emitter->>Emitter: ldmatrix / make_mma_load_layout / mma_sp / stmatrix
    Emitter-->>TileOp: fragment/layout map
    TileOp-->>TL_Op: layout_map
    TL_Op->>TileOp: lower(target, thread_nums, thread_var)
    TileOp->>Emitter: emit lowering -> prim_func
    TileOp-->>TL_Op: lowered kernel
    TL_Op-->>User: compiled kernel handle

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Attention points:
- tilelang/intrinsics/mma_sp_macro_generator.py — large emitter with dense index/thread math and dtype-conditional branches.
- tilelang/tileop/gemm_sp/gemm_sp_mma.py — multiple kernel variants, warp partitioning, lowering complexity.
- src/op/gemm_sp_py.cc / src/op/gemm_sp_py.h — FFI deserialization, pointer/stride handling and lowering hooks.
- tilelang/layout/gemm_sp.py — metadata layout math, arch dispatch, FP8 handling.
- benchmark/examples/tests — confirm API updates (matmul_sp signature, in_dtype propagation, make_cutlass_metadata_layout, T.gemm_sp_v2).

Possibly related issues

tile-ai/tilelang#935 — New gemm_sp_v2 API and Cutlass metadata layout migration align with this issue's objectives.

Possibly related PRs

tile-ai/tilelang#691 — Modifies the same benchmark matmul path and metadata/layout handling; likely overlap.
tile-ai/tilelang#783 — Related GEMM-SP operator, warp-policy, and FFI binding changes.
tile-ai/tilelang#1129 — Overlaps low-level CUDA template integer packing helpers (related make_int4 additions).

Suggested reviewers

LeiWang1999
chengyupku

Poem

🐇 I hopped through fragments, layouts bright,
I packed int4s and tuned the thread's flight.
gemm_sp_v2 sings sparse delight,
Metadata mapped, kernels take flight.
A rabbit cheers — sparse math feels light!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.18% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding support for T.gemm_sp_v2 on SM80 and SM89 architectures, which aligns with the core objective documented in the PR.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

[!TIP]

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.

Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).

Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions: | Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context. Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Oct 17 '25 09:10 coderabbitai[bot]

we're good to go if we can resolve the conflict and I think then we can let this pr in.

Nov 13 '25 16:11 LeiWang1999

[Language] support `T.gemm_sp_v2` on sm80 and sm89

SM8x

Misc

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches