mistral.rs
mistral.rs copied to clipboard
FlashMLA support
Support FlashMLA for improved throughput for MLA models (DeepSeek V2, V3/R1) on CUDA.
https://github.com/EricLBuehler/candle/pull/74
https://github.com/deepseek-ai/FlashMLA
Code Metrics Report
=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 2 34 29 0 5 Dockerfile 1 41 22 10 9 JSON 12 105 104 0 1 Makefile 1 6 5 0 1 Python 73 3126 2710 85 331 Shell 1 58 22 18 18 Plain Text 3 3723 0 2413 1310 TOML 19 531 492 2 37 YAML 2 21 19 2 0 ------------------------------------------------------------------------------- Jupyter Notebooks 4 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 205 178 1 26 (Total) 282 210 32 40 ------------------------------------------------------------------------------- Markdown 50 4205 0 3196 1009 |- BASH 6 103 100 0 3 |- JSON 1 12 12 0 0 |- Python 7 121 109 0 12 |- Rust 17 586 495 0 91 |- TOML 2 75 63 0 12 (Total) 5102 779 3196 1127 ------------------------------------------------------------------------------- Rust 339 112404 100684 2173 9547 |- Markdown 158 1808 25 1642 141 (Total) 114212 100709 3815 9688 =============================================================================== Total 507 124254 104087 7899 12268 ===============================================================================