Eric Buehler
Eric Buehler
This is part 1 of 3 chunks of #2615 for easier review. It synchronizes the GGML kernels to support bf16. I will open the others sequentially as this one is...
Implements: https://github.com/deepseek-ai/FlashMLA - Tests are added and passing!
This PR integrates kernel developments from: https://github.com/ml-explore/mlx/pull/1924. Specifically, our `candle_nn::ops::sdpa` function now dispatches to optimized implementations for with and without prompts. There is also an option for causal masking, removing...
- [x] Add vision part - mmproj - [x] Implement forward pass - [x] Add inputs processor (pan & scan, normalize, etc) - [x] Example
This PR refactors the quantization parts of candle-core a bit and integrates some new I-quants! There is no CUDA or Metal support yet; perhaps we could add that in a...
- [x] Prefix cacher handles pagedattention seqs - [x] Updated sequence logic - [ ] Updated inputs processing Currently not working yet.
## Summary - add CUDA kernels for blockwise FP8 matmul - wire up new FFI and rust bindings - provide helper `fp8_blockwise_gemm` and test - compile new kernels only on...
## Summary - implement `FromStr` for `IsqType` - delegate ISQ string parsing to `IsqType::from_str` ## Testing - `cargo fmt` *(fails: `rustfmt` not installed)* - `cargo check --locked` *(fails: could not...