[Language]Adds a random number generation capability
Summary by CodeRabbit
-
New Features
- Added random number generation support for tensor buffers, enabling 1D and 2D buffer population with Philox-based random values.
-
Tests
- Added comprehensive test suite validating random number generation across multiple implementations and configurations with numerical accuracy checks.
[!NOTE]
Other AI code review bot(s) detected
CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.
Walkthrough
The PR introduces Philox-based random number generation support to TileLang, spanning Python API layer, C++ intrinsic registration, CUDA codegen integration, and device-side implementation, alongside comprehensive 1D and 2D test cases validating cross-implementation consistency.
Changes
| Cohort / File(s) | Summary |
|---|---|
Python API & Tests tilelang/language/__init__.py, tilelang/language/random.py |
Added public rand() function for Philox RNG with shape validation; internal _rand_parallel_impl() macro wrapper; re-exported rand at package level. |
Test Suite testing/python/language/test_tilelang_language_rand.py |
New test module with 1D/2D RNG kernels, cross-validation against Triton and Torch, seeded dropout path, and CUDA-gated test execution. |
C++ Builtin Infrastructure src/op/builtin.h, src/op/builtin.cc |
Added philox_rand() intrinsic declaration and TL builtin registration with 6-input signature and opaque call effect. |
CUDA Codegen src/target/codegen_cuda.h, src/target/codegen_cuda.cc |
Introduced need_random_h_ flag; integrated philox_rand intrinsic handling into CallNode emission and Finish phase, including conditional random.h header emission. |
CUDA Device Implementation src/tl_templates/cuda/random.h |
Added Philox RNG kernel (philox_rand), helper functions (umulhi_uint32, philox_impl_device, uint32_to_uniform_float_device), with per-thread work distribution and bounds checking. |
Sequence Diagram
sequenceDiagram
participant PyUser as Python User
participant PyAPI as tilelang.language.rand()
participant Macro as _rand_parallel_impl()
participant TVMCall as T.call_intrin(philox_rand)
participant Codegen as CUDA Codegen
participant DevKernel as Device Kernel (philox_rand)
PyUser->>PyAPI: rand(buffer, seed, n_rounds)
activate PyAPI
PyAPI->>PyAPI: Validate shape (1D or 2D)
PyAPI->>PyAPI: Compute total_elems, block_m, block_n
PyAPI->>Macro: _rand_parallel_impl(buffer, seed, ...)
deactivate PyAPI
activate Macro
Macro->>TVMCall: T.call_intrin(philox_rand, ...)
deactivate Macro
TVMCall->>Codegen: Emit CUDA kernel call
activate Codegen
Codegen->>Codegen: Set need_random_h_ flag
Codegen->>Codegen: Emit extern call to tl::philox_rand
Codegen->>Codegen: Include random.h in Finish
deactivate Codegen
Codegen->>DevKernel: Execute on CUDA device
activate DevKernel
DevKernel->>DevKernel: Initialize Philox state from seed
DevKernel->>DevKernel: Run philox_impl_device (n_rounds)
DevKernel->>DevKernel: Convert uint32 β uniform float
DevKernel->>DevKernel: Write to output buffer
deactivate DevKernel
Estimated code review effort
π― 3 (Moderate) | β±οΈ ~25 minutes
- src/tl_templates/cuda/random.h: Device-side Philox implementation with bit-manipulation logic (umulhi_uint32, state transforms) requires verification of cryptographic soundness and numerical correctness.
- tilelang/language/random.py: Shape validation logic and parameter derivation (block_m, block_n calculation) merit careful review.
- src/target/codegen_cuda.cc/h: Integration points with existing codegen patterns; verify flag lifecycle and header inclusion correctness.
- testing/python/language/test_tilelang_language_rand.py: Cross-implementation validation logic (1D, 2D, seeded dropout) and tolerance thresholds (atol/rtol ~1e-3) warrant scrutiny for floating-point equivalence claims.
Poem
π° A Philox of seeds we now shall sow,
With random bits in CUDA's glow,
Across the tensors, fast and free,
From one-dee to two-dee spree,
Round and round the cryptos go! π²
Pre-merge checks and finishing touches
β Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | β οΈ Warning | Docstring coverage is 21.74% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
β Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | β Passed | Check skipped - CodeRabbitβs high-level summary is enabled. |
| Title check | β Passed | The title directly describes the main change: adding random number generation capability. It is concise, clear, and accurately reflects the primary purpose of the pull request across multiple files. |
β¨ Finishing touches
- [ ] π Generate docstrings
π§ͺ Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
π Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π₯ Commits
Reviewing files that changed from the base of the PR and between ebd64c367b6f0cdcfad50c0591a9d426abdc2cab and e381855a03bb7203cb7075363eedab6eb8a57ae2.
π Files selected for processing (3)
src/op/builtin.cc(1 hunks)src/tl_templates/cuda/random.h(1 hunks)testing/python/language/test_tilelang_language_rand.py(1 hunks)
π§° Additional context used
𧬠Code graph analysis (1)
testing/python/language/test_tilelang_language_rand.py (6)
src/tl_templates/cuda/random.h (1)
tl(5-93)tilelang/env.py (1)
disable_cache(271-272)tilelang/jit/kernel.py (1)
out_idx(461-462)tilelang/language/allocate.py (1)
alloc_fragment(59-70)tilelang/language/random.py (1)
rand(25-40)tilelang/language/parallel.py (1)
Parallel(9-29)
πͺ Ruff (0.14.3)
testing/python/language/test_tilelang_language_rand.py
27-27: Unused function argument: dtype
(ARG001)
50-50: Unused function argument: dtype
(ARG001)
β° Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
π Hi! Thank you for contributing to the TileLang project.
Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.
We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! π
It seems there are still some issues when the buffer is 2D...
@codex review