candle issues

Qwen3 VL support

4

would consider support new models nowadays?

using candle to make inference of yolov8 model is quite slow than PyTorch or onnxruntime!

7

### Env: GPU: NVIDIA GeForce RTX 3060, 12036MiB) CPU: 12th Gen Intel(R) Core(TM) i5-12400F OS: Ubuntu 23.04 Model: yolov8s.pt, yolov8s.onnx, yolov8s.safetensors #### speed test on 1000 images: - candle: ~55ms...

jamjamjon

Add Provence model implementation

Add implementation for https://huggingface.co/naver/provence-reranker-debertav3-v1. This is still a WIP, but I wanted to gauge interest before going too far. ### Notes - Provence has a [CC Non Commercial license](https://huggingface.co/naver/provence-reranker-debertav3-v1/blob/main/Provence_LICENSE.txt) -...

matthewhaynesonline

Conv2d cpu performance is quite slow

10

Candle's convolution operations on CPU are quite slow, compared to Pytorch. # Some numbers Conv2d run configuration: - batch_size = 2 - in_channels = 3 - width = 320 -...

slckl

PP-OCRv5 support

https://huggingface.co/collections/PaddlePaddle/pp-ocrv5

zhengqwe

Candle vs. PyTorch performance

8

I'm running https://github.com/huggingface/candle/tree/main/candle-examples/examples/llava vs. https://github.com/fpgaminer/joycaption/blob/main/scripts/batch-caption.py on a Mac m1. Seeing significant performance difference, Candle seems much slower. I enabled accelerate and metal features. Would love some pointers how to improve...

ohaddahan

Depth anything V3 support

1

https://huggingface.co/depth-anything/DA3-LARGE the Depth anything V3 dropped, it was extremly useful for monocular camera depth estimation, it can achieve many applications which needs precise 3D points.

lucasjinreal

Seeking Wisdom on Cuda

7

Hi! I am attempting to get this working with Cuda support. Any ideas? Thank you! Works great without the CUDA flag. **Hardware:** - RTX480 with latest drivers - CUDA 13.0...

David-OConnor

CPU-Optimized Kernels for Interleaved GGUF Weights (Following llama.cpp)

3

llama.cpp achieves superior CPU performance through thread-optimized kernels that compute directly on GGUF's native weight layouts. Candle should follow this approach to match llama.cpp's CPU efficiency and support diverse GGUF...

DrJesseGlass

Reorganize Transformers Module by Model Family

1

## Summary The `candle-transformers/src/models/` directory has grown to contain 70+ flat module entries, mixing full and quantized implementations of the same model families. This makes the codebase harder to navigate...

DrJesseGlass

candle
candle copied to clipboard

Metadata

Qwen3 VL support

using candle to make inference of yolov8 model is quite slow than PyTorch or onnxruntime!

Add Provence model implementation

Conv2d cpu performance is quite slow

PP-OCRv5 support

Candle vs. PyTorch performance

Depth anything V3 support

Seeking Wisdom on Cuda

CPU-Optimized Kernels for Interleaved GGUF Weights (Following llama.cpp)

Reorganize Transformers Module by Model Family

← Metadata

Owner

Metadata

candle candle copied to clipboard

Metadata

← Metadata

Owner

Metadata

candle
candle copied to clipboard