tilelang icon indicating copy to clipboard operation
tilelang copied to clipboard

[Language]Adds a random number generation capability

Open Cunxiao2002 opened this issue 1 month ago β€’ 4 comments

Summary by CodeRabbit

  • New Features

    • Added random number generation support for tensor buffers, enabling 1D and 2D buffer population with Philox-based random values.
  • Tests

    • Added comprehensive test suite validating random number generation across multiple implementations and configurations with numerical accuracy checks.

Cunxiao2002 avatar Nov 05 '25 12:11 Cunxiao2002

[!NOTE]

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The PR introduces Philox-based random number generation support to TileLang, spanning Python API layer, C++ intrinsic registration, CUDA codegen integration, and device-side implementation, alongside comprehensive 1D and 2D test cases validating cross-implementation consistency.

Changes

Cohort / File(s) Summary
Python API & Tests
tilelang/language/__init__.py, tilelang/language/random.py
Added public rand() function for Philox RNG with shape validation; internal _rand_parallel_impl() macro wrapper; re-exported rand at package level.
Test Suite
testing/python/language/test_tilelang_language_rand.py
New test module with 1D/2D RNG kernels, cross-validation against Triton and Torch, seeded dropout path, and CUDA-gated test execution.
C++ Builtin Infrastructure
src/op/builtin.h, src/op/builtin.cc
Added philox_rand() intrinsic declaration and TL builtin registration with 6-input signature and opaque call effect.
CUDA Codegen
src/target/codegen_cuda.h, src/target/codegen_cuda.cc
Introduced need_random_h_ flag; integrated philox_rand intrinsic handling into CallNode emission and Finish phase, including conditional random.h header emission.
CUDA Device Implementation
src/tl_templates/cuda/random.h
Added Philox RNG kernel (philox_rand), helper functions (umulhi_uint32, philox_impl_device, uint32_to_uniform_float_device), with per-thread work distribution and bounds checking.

Sequence Diagram

sequenceDiagram
    participant PyUser as Python User
    participant PyAPI as tilelang.language.rand()
    participant Macro as _rand_parallel_impl()
    participant TVMCall as T.call_intrin(philox_rand)
    participant Codegen as CUDA Codegen
    participant DevKernel as Device Kernel (philox_rand)
    
    PyUser->>PyAPI: rand(buffer, seed, n_rounds)
    activate PyAPI
    PyAPI->>PyAPI: Validate shape (1D or 2D)
    PyAPI->>PyAPI: Compute total_elems, block_m, block_n
    PyAPI->>Macro: _rand_parallel_impl(buffer, seed, ...)
    deactivate PyAPI
    
    activate Macro
    Macro->>TVMCall: T.call_intrin(philox_rand, ...)
    deactivate Macro
    
    TVMCall->>Codegen: Emit CUDA kernel call
    activate Codegen
    Codegen->>Codegen: Set need_random_h_ flag
    Codegen->>Codegen: Emit extern call to tl::philox_rand
    Codegen->>Codegen: Include random.h in Finish
    deactivate Codegen
    
    Codegen->>DevKernel: Execute on CUDA device
    activate DevKernel
    DevKernel->>DevKernel: Initialize Philox state from seed
    DevKernel->>DevKernel: Run philox_impl_device (n_rounds)
    DevKernel->>DevKernel: Convert uint32 β†’ uniform float
    DevKernel->>DevKernel: Write to output buffer
    deactivate DevKernel

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • src/tl_templates/cuda/random.h: Device-side Philox implementation with bit-manipulation logic (umulhi_uint32, state transforms) requires verification of cryptographic soundness and numerical correctness.
  • tilelang/language/random.py: Shape validation logic and parameter derivation (block_m, block_n calculation) merit careful review.
  • src/target/codegen_cuda.cc/h: Integration points with existing codegen patterns; verify flag lifecycle and header inclusion correctness.
  • testing/python/language/test_tilelang_language_rand.py: Cross-implementation validation logic (1D, 2D, seeded dropout) and tolerance thresholds (atol/rtol ~1e-3) warrant scrutiny for floating-point equivalence claims.

Poem

🐰 A Philox of seeds we now shall sow,
With random bits in CUDA's glow,
Across the tensors, fast and free,
From one-dee to two-dee spree,
Round and round the cryptos go! 🎲

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.74% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
βœ… Passed checks (2 passed)
Check name Status Explanation
Description Check βœ… Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check βœ… Passed The title directly describes the main change: adding random number generation capability. It is concise, clear, and accurately reflects the primary purpose of the pull request across multiple files.
✨ Finishing touches
  • [ ] πŸ“ Generate docstrings
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

πŸ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between ebd64c367b6f0cdcfad50c0591a9d426abdc2cab and e381855a03bb7203cb7075363eedab6eb8a57ae2.

πŸ“’ Files selected for processing (3)
  • src/op/builtin.cc (1 hunks)
  • src/tl_templates/cuda/random.h (1 hunks)
  • testing/python/language/test_tilelang_language_rand.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
testing/python/language/test_tilelang_language_rand.py (6)
src/tl_templates/cuda/random.h (1)
  • tl (5-93)
tilelang/env.py (1)
  • disable_cache (271-272)
tilelang/jit/kernel.py (1)
  • out_idx (461-462)
tilelang/language/allocate.py (1)
  • alloc_fragment (59-70)
tilelang/language/random.py (1)
  • rand (25-40)
tilelang/language/parallel.py (1)
  • Parallel (9-29)
πŸͺ› Ruff (0.14.3)
testing/python/language/test_tilelang_language_rand.py

27-27: Unused function argument: dtype

(ARG001)


50-50: Unused function argument: dtype

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 05 '25 12:11 coderabbitai[bot]

πŸ‘‹ Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! πŸš€

github-actions[bot] avatar Nov 05 '25 12:11 github-actions[bot]

It seems there are still some issues when the buffer is 2D...

Cunxiao2002 avatar Nov 07 '25 18:11 Cunxiao2002

@codex review

LeiWang1999 avatar Nov 08 '25 14:11 LeiWang1999