mistral.rs
mistral.rs copied to clipboard
KV Cache Quantization
- [x] Metal kernels
- [x] Quantize (f32, f16,bf16) -> (q4_0, q8_0)
- [x] Dequantize (q4_0, q8_0) -> (f32, f16,bf16)
- [ ] CUDA kernels
- [ ] Quantize (f32, f16,bf16) -> (q4_0, q8_0)
- [ ] Dequantize (q4_0, q8_0) -> (f32, f16,bf16)
- [x] KV cache quantization
Code Metrics Report
=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 2 35 28 0 7 Dockerfile 1 41 22 10 9 JSON 12 105 104 0 1 Python 69 2926 2534 77 315 Shell 1 58 22 18 18 Plain Text 3 3723 0 2413 1310 TOML 18 627 556 2 69 YAML 2 21 19 2 0 ------------------------------------------------------------------------------- Jupyter Notebooks 4 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 205 178 1 26 (Total) 282 210 32 40 ------------------------------------------------------------------------------- Markdown 46 3802 0 2891 911 |- BASH 6 103 100 0 3 |- JSON 1 12 12 0 0 |- Python 7 121 109 0 12 |- Rust 15 512 433 0 79 |- TOML 2 75 63 0 12 (Total) 4625 717 2891 1017 ------------------------------------------------------------------------------- Rust 309 99706 89368 1933 8405 |- Markdown 149 1690 25 1540 125 (Total) 101396 89393 3473 8530 =============================================================================== Total 467 111044 92653 7346 11045 ===============================================================================