candle issues

Candle Inference ~8.5x Slower Than PyTorch on CPU

17

## Issue Inference with the `sentence-transformers/all-MiniLM-L6-v2` model using `candle-transformers` (v0.8.4) is significantly slower (approximately 8.5 times) compared to the standard PyTorch implementation using transformers and PyTorch. The average time per...

msminhas93

Poor performance in back propagation

5

(All numbers are measured on an M1 Max, with samply -- a sampling CPU profiler) The performance numbers I'll share are for the following model: ```rust const N_EMBD: usize =...

alex

candle-onnx: Implement LRN operator

Added LRN operator with tests. Related issue: https://github.com/huggingface/candle/issues/2852

BrunoSienkiewicz

Add metal precompilation

6

To improve the maintainability of `candle-metal-kernels` I think it is about time we introduce a `build.rs` to the project. This won't necessarily affect performance, but it will let us reuse...

ivarflakstad

Can we accelerate the onnx model with a GPU from candle?

ref： https://github.com/huggingface/candle/blob/main/candle-examples/examples/onnx/main.rs I see that the underlying implementation all uses candle, so theoretically it should be supported, right?

Baiyuetribe

Sync with GGML: add GGML bf16 support

1

This is part 1 of 3 chunks of #2615 for easier review. It synchronizes the GGML kernels to support bf16. I will open the others sequentially as this one is...

EricLBuehler

Add FlashMLA

1

Implements: https://github.com/deepseek-ai/FlashMLA - Tests are added and passing!

EricLBuehler

Integrate MLX SDPA kernels with mask

This PR integrates kernel developments from: https://github.com/ml-explore/mlx/pull/1924. Specifically, our `candle_nn::ops::sdpa` function now dispatches to optimized implementations for with and without prompts. There is also an option for causal masking, removing...

EricLBuehler

Misleading `Tensor::matmul` documentation

1

First off, great job on this project! I noticed that Gemm (at least for cuda from what I've seen) is not supported for tensors with dims > 4, however, the...

kckeiks

Add qwen3 on candle-transformers

1

Add qwen3.rs to candle-transformers. - fixed mod.rs - add qwen3.rs I got reference from transformers's Qwen3 python codes, which is https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3/configuration_qwen3.py @LaurentMazare Could you review the code?

maximizemaxwell

candle
candle copied to clipboard

Metadata

Candle Inference ~8.5x Slower Than PyTorch on CPU

Poor performance in back propagation

candle-onnx: Implement LRN operator

Add metal precompilation

Can we accelerate the onnx model with a GPU from candle?

Sync with GGML: add GGML bf16 support

Add FlashMLA

Integrate MLX SDPA kernels with mask

Misleading `Tensor::matmul` documentation

Add qwen3 on candle-transformers

← Metadata

Owner

Metadata

candle candle copied to clipboard

Metadata

← Metadata

Owner

Metadata

candle
candle copied to clipboard