Add blockwise fp8 gemm kernel
Summary
- add CUDA kernels for blockwise FP8 matmul
- wire up new FFI and rust bindings
- provide helper
fp8_blockwise_gemmand test - compile new kernels only on CUDA >= 8.0
Testing
-
cargo test -p mistralrs-quant test_blockwise_fp8_gemm --features=cuda(fails: failed to getcandle-coreas a dependency due to network issues)
Summary by CodeRabbit
-
New Features
- Added support for blockwise FP8 matrix multiplication (GEMM) on CUDA, enabling efficient computation with FP8 weights and multiple input/output precisions (FP16, BF16, FP32).
- Introduced a new operation for blockwise FP8 GEMM, accessible via a public function.
-
Tests
- Added tests to validate the new blockwise FP8 GEMM operation against reference outputs.
Walkthrough
This change introduces blockwise FP8 GEMM (general matrix multiplication) support into the codebase. It adds CUDA kernel implementations and dummy stubs, updates the build system for conditional compilation based on compute capability, extends the FFI interface, and implements a Rust-side operation with validation and tests.
Changes
| File(s) | Change Summary |
|---|---|
| mistralrs-quant/build.rs, mistralrs-quant/src/blockwise_fp8/ffi.rs | Build script now conditionally compiles real or dummy blockwise FP8 GEMM kernels and updates the HAVE_BLOCKWISE_GEMM_KERNELS constant in FFI based on CUDA capability. FFI declarations for three new kernel launcher functions were added, along with the constant. |
| mistralrs-quant/kernels/blockwise_fp8/blockwise_fp8_gemm.cu | Added CUDA kernel and three launcher functions for blockwise FP8 GEMM supporting FP16, BF16, and FP32 input/output. Implements matrix multiplication with blockwise scaling and mixed precision, exposing C-callable entry points. |
| mistralrs-quant/kernels/blockwise_fp8/blockwise_fp8_gemm_dummy.cu | Added dummy CUDA file with stubbed launcher functions for blockwise FP8 GEMM, each asserting on use. Used when the compute capability is insufficient. |
| mistralrs-quant/src/blockwise_fp8/ops.rs | Introduced Fp8BlockwiseGemm struct and implemented the CustomOp3 trait. Added the fp8_blockwise_gemm function, handling validation, kernel invocation, and output allocation for the new operation. Extended the test suite with a CUDA test for blockwise FP8 GEMM. |
Sequence Diagram(s)
sequenceDiagram
participant RustOp as Fp8BlockwiseGemm (Rust)
participant FFI as FFI Layer
participant CUDA as CUDA Kernel
RustOp->>RustOp: Validate input tensors
RustOp->>FFI: Call launch_gemm_fp8_blockwise_kernel_* (based on dtype)
FFI->>CUDA: Launch CUDA kernel with pointers and parameters
CUDA-->>FFI: Compute output in device memory
FFI-->>RustOp: Return output buffer
RustOp->>RustOp: Return output tensor and shape
Poem
In the warren where numbers hop and leap,
Blockwise FP8 kernels now dig deep!
CUDA streams and tensors align,
Matrix bunnies multiply just fine.
If your GPU’s old, don’t fret or cry—
The dummy kernel just says “Goodbye!”
🐇✨
[!WARNING]
Review ran into problems
🔥 Problems
Git: Failed to clone repository. Please run the
@coderabbitai full reviewcommand to re-trigger a full review. If the issue persists, setpath_filtersto include or exclude specific files.
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-
I pushed a fix in commit <commit_id>, please review it. -
Explain this complex logic. -
Open a follow-up GitHub issue for this discussion.
-
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:-
@coderabbitai explain this code block. -
@coderabbitai modularize this function.
-
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:-
@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase. -
@coderabbitai read src/utils.ts and explain its main purpose. -
@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format. -
@coderabbitai help me debug CodeRabbit configuration file.
-
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
-
@coderabbitai pauseto pause the reviews on a PR. -
@coderabbitai resumeto resume the paused reviews. -
@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository. -
@coderabbitai full reviewto do a full review from scratch and review all the files again. -
@coderabbitai summaryto regenerate the summary of the PR. -
@coderabbitai generate docstringsto generate docstrings for this PR. -
@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR. -
@coderabbitai resolveresolve all the CodeRabbit review comments. -
@coderabbitai configurationto show the current CodeRabbit configuration for the repository. -
@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
Code Metrics Report
=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 3 62 53 0 9 Dockerfile 1 41 22 10 9 JSON 12 107 106 0 1 Makefile 1 6 5 0 1 Python 84 3713 3163 140 410 Shell 1 63 26 18 19 Plain Text 3 3723 0 2413 1310 TOML 19 557 512 6 39 YAML 2 21 19 2 0 ------------------------------------------------------------------------------- Jupyter Notebooks 3 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 205 178 1 26 (Total) 282 210 32 40 ------------------------------------------------------------------------------- Markdown 55 5002 0 3812 1190 |- BASH 8 104 101 0 3 |- JSON 1 12 12 0 0 |- Python 7 121 109 0 12 |- Rust 22 757 634 1 122 |- TOML 2 75 63 0 12 (Total) 6071 919 3813 1339 ------------------------------------------------------------------------------- Rust 378 126689 113088 2587 11014 |- Markdown 171 2145 29 1913 203 (Total) 128834 113117 4500 11217 =============================================================================== Total 562 139984 116994 8988 14002 ===============================================================================