Eric Buehler issues

Results 136 issues of


                                            Eric Buehler

Sync with GGML: add GGML bf16 support

This is part 1 of 3 chunks of #2615 for easier review. It synchronizes the GGML kernels to support bf16. I will open the others sequentially as this one is...

Add FlashMLA

Implements: https://github.com/deepseek-ai/FlashMLA - Tests are added and passing!

Integrate MLX SDPA kernels with mask

This PR integrates kernel developments from: https://github.com/ml-explore/mlx/pull/1924. Specifically, our `candle_nn::ops::sdpa` function now dispatches to optimized implementations for with and without prompts. There is also an option for causal masking, removing...

Implement the Gemma 3 vision models!

- [x] Add vision part - mmproj - [x] Implement forward pass - [x] Add inputs processor (pan & scan, normalize, etc) - [x] Example

Implement I-quants (IQ4XS, IQ4NL)

This PR refactors the quantization parts of candle-core a bit and integrates some new I-quants! There is no CUDA or Metal support yet; perhaps we could add that in a...

Prefix caching for PagedAttention

- [x] Prefix cacher handles pagedattention seqs - [x] Updated sequence logic - [ ] Updated inputs processing Currently not working yet.

optimization

paged-attention

Eric Buehler

Sync with GGML: add GGML bf16 support

Add FlashMLA

Integrate MLX SDPA kernels with mask

Implement the Gemma 3 vision models!

Implement I-quants (IQ4XS, IQ4NL)

Prefix caching for PagedAttention

Experiments with mistralrs-codex

Add blockwise fp8 gemm kernel

Fix AFQ8 parsing for --isq

Proper documentation with Sphinx