FlashMLA support

Open EricLBuehler opened this issue 1 year ago • 1 comments

Support FlashMLA for improved throughput for MLA models (DeepSeek V2, V3/R1) on CUDA.

https://github.com/EricLBuehler/candle/pull/74

https://github.com/deepseek-ai/FlashMLA

Feb 25 '25 02:02 EricLBuehler

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           34           29            0            5
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Makefile                1            6            5            0            1
 Python                 73         3126         2710           85          331
 Shell                   1           58           22           18           18
 Plain Text              3         3723            0         2413         1310
 TOML                   19          531          492            2           37
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               50         4205            0         3196         1009
 |- BASH                 6          103          100            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                17          586          495            0           91
 |- TOML                 2           75           63            0           12
 (Total)                           5102          779         3196         1127
-------------------------------------------------------------------------------
 Rust                  339       112404       100684         2173         9547
 |- Markdown           158         1808           25         1642          141
 (Total)                         114212       100709         3815         9688
===============================================================================
 Total                 507       124254       104087         7899        12268
===============================================================================

Feb 25 '25 02:02 github-actions[bot]