tilelang [Feature]: Add benchmark scripts for examples

Summary by CodeRabbit

New Features
- Added a unified benchmarking system across examples with per-example benchmark entry points, a bench-all runner, and automated aggregation.
- Generates performance reports: markdown table and plotted chart (image) for visual comparison and speedup ranking.
- CI now exposes benchmark outputs (table + embedded plot) for PRs.
Chores
- Updated CI workflow permissions and standardized installation steps for performance runs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Nov 19 '25 11:11 yyttt6

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

Nov 19 '25 11:11 github-actions[bot]

Walkthrough

Adds a lightweight benchmarking framework (tilelang.testing.benchmark), many example-level bench runners and run_regression_perf entrypoints, replaces the perf CI workflow with a new PR-triggered workflow and updated maint/scripts/ci_performance.py, and generates bench.md/bench.png. Several example files contain duplicated function insertions.

Changes

Cohort / File(s)	Summary
Benchmark core `\`tilelang/testing/benchmark.py``, `\`tilelang/testing/init.py``, `\`maint/scripts/bench_entry.py``	New benchmark framework exposing `bench()` and `process_func()`, multiprocessing isolation, record aggregation, PNG/markdown output; `bench` exported in testing package; bench entry script added.
CI workflows & script `\`.github/workflows/pr-perfbench-bot.yml` (removed)`,` `.github/workflows/pr-regression-test-bot.yml``, `\`maint/scripts/ci_performance.py``	Removed old workflow; added PR-triggered workflow that checks out merged/main, installs both versions, runs `ci_performance.py`; `ci_performance.py` rewritten to run commands, parse outputs, compute speedups, emit `bench.md` and `bench.png`, and exposes `run_cmd()`/`draw()`.
Bench runner scripts `\`examples//bench_.py` (many, e.g., attention_sink, flash_attention, blocksparse_, gemm, dequantize_gemm, deepseek_*, convolution, elementwise, gemv, linear_attention, sparse_tensorcore, topk, warp_specialize, fusedmoe, ...)``	~50 new bench wrapper scripts that call `tilelang.testing.benchmark.process_func`, each exposing `bench_*` functions and `__main__` guard calling `tilelang.testing.bench()`.
Example modules — perf entrypoints `\`examples/*/example_.py` (many files)`	Added `run_regression_perf` (or similar) functions across numerous example modules to enable programmatic benchmarking; many files include duplicated/identical insertions (redefinitions).
Profiler imports & minor formatting `\`examples/**` (various)`	Added `from tilelang.profiler import do_bench` in many examples; minor whitespace/import formatting tweaks in a few files.
Artifacts `\`bench.md``, `\`bench.png` (generated)`	CI produces a markdown table comparing Original vs Current latencies (bench.md) and a PNG visualization (bench.png).

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub (comment)
    participant WF as Workflow (pr-regression-test-bot)
    participant Runner as Self-hosted Runner
    participant CI as ci_performance.py
    participant Bench as tilelang.testing.benchmark
    participant Worker as Multiprocess Worker
    participant Kernel as Example Kernel
    participant Viz as matplotlib

    GH->>WF: comment "/perf" (issue_comment)
    WF->>Runner: checkout PR merge ref and main
    Runner->>Runner: setup Python envs, install merged & original
    Runner->>CI: run maint/scripts/ci_performance.py
    CI->>Bench: invoke bench_all / bench
    Bench->>Worker: spawn worker per bench target (multiprocess)
    Worker->>Kernel: load example module, call run_regression_perf / kernel-only
    Kernel-->>Worker: return latency record
    Worker-->>Bench: send latency record
    Bench-->>CI: aggregated bench.md content
    CI->>Viz: draw() → produce bench.png
    WF->>GH: post PR comment with bench.md and artifact link

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Focus areas for review:

tilelang/testing/benchmark.py: multiprocessing lifecycle, CUDA context handling, error propagation, and record aggregation.
maint/scripts/ci_performance.py: command execution, regex parsing, numeric conversions, speedup computation, artifact paths.
Examples: duplicate run_regression_perf/bench function insertions across many files — de-duplicate and confirm exported symbols.
Workflow: environment setup, virtualenv isolation, permissions and artifact upload steps.

Possibly related PRs

tile-ai/tilelang#973 — touches the perfbench CI workflow and is directly related to removing/replacing the old workflow.
tile-ai/tilelang#971 — overlaps CI perf benchmarking workflow changes and comment-trigger behavior.
tile-ai/tilelang#853 — modifies attention_sink examples that are targeted by the new benchmark wrappers.

Suggested reviewers

LeiWang1999

"🐰 With whiskers twitching I time each run,
Hopping from kernel to kernel, having fun.
I log and plot, then nibble a carrot sweet,
Benchmarks in hand — hop, measure, repeat! 🥕"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 1.83% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[Feature]: Add benchmark scripts for examples' clearly and concisely describes the main change: adding benchmark scripts. It is specific and directly related to the changeset.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 19 '25 11:11 coderabbitai[bot]

/perf

Nov 19 '25 11:11 yyttt6

/perf

Nov 19 '25 11:11 yyttt6

/perf

Nov 19 '25 11:11 yyttt6