mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

KV Cache Quantization

Open EricLBuehler opened this issue 1 year ago • 1 comments

  • [x] Metal kernels
    • [x] Quantize (f32, f16,bf16) -> (q4_0, q8_0)
    • [x] Dequantize (q4_0, q8_0) -> (f32, f16,bf16)
  • [ ] CUDA kernels
    • [ ] Quantize (f32, f16,bf16) -> (q4_0, q8_0)
    • [ ] Dequantize (q4_0, q8_0) -> (f32, f16,bf16)
  • [x] KV cache quantization

EricLBuehler avatar Dec 11 '24 20:12 EricLBuehler

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 69         2926         2534           77          315
 Shell                   1           58           22           18           18
 Plain Text              3         3723            0         2413         1310
 TOML                   18          627          556            2           69
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               46         3802            0         2891          911
 |- BASH                 6          103          100            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                15          512          433            0           79
 |- TOML                 2           75           63            0           12
 (Total)                           4625          717         2891         1017
-------------------------------------------------------------------------------
 Rust                  309        99706        89368         1933         8405
 |- Markdown           149         1690           25         1540          125
 (Total)                         101396        89393         3473         8530
===============================================================================
 Total                 467       111044        92653         7346        11045
===============================================================================
  

github-actions[bot] avatar Dec 11 '24 20:12 github-actions[bot]