candle
candle copied to clipboard
Minimalist ML framework for Rust
## Issue Inference with the `sentence-transformers/all-MiniLM-L6-v2` model using `candle-transformers` (v0.8.4) is significantly slower (approximately 8.5 times) compared to the standard PyTorch implementation using transformers and PyTorch. The average time per...
(All numbers are measured on an M1 Max, with samply -- a sampling CPU profiler) The performance numbers I'll share are for the following model: ```rust const N_EMBD: usize =...
Added LRN operator with tests. Related issue: https://github.com/huggingface/candle/issues/2852
To improve the maintainability of `candle-metal-kernels` I think it is about time we introduce a `build.rs` to the project. This won't necessarily affect performance, but it will let us reuse...
ref: https://github.com/huggingface/candle/blob/main/candle-examples/examples/onnx/main.rs I see that the underlying implementation all uses candle, so theoretically it should be supported, right?
This is part 1 of 3 chunks of #2615 for easier review. It synchronizes the GGML kernels to support bf16. I will open the others sequentially as this one is...
Implements: https://github.com/deepseek-ai/FlashMLA - Tests are added and passing!
This PR integrates kernel developments from: https://github.com/ml-explore/mlx/pull/1924. Specifically, our `candle_nn::ops::sdpa` function now dispatches to optimized implementations for with and without prompts. There is also an option for causal masking, removing...
First off, great job on this project! I noticed that Gemm (at least for cuda from what I've seen) is not supported for tensors with dims > 4, however, the...
Add qwen3.rs to candle-transformers. - fixed mod.rs - add qwen3.rs I got reference from transformers's Qwen3 python codes, which is https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3/configuration_qwen3.py @LaurentMazare Could you review the code?