candle issues

tracking: support silero-vad v5

9

This ticket tracks my efforts to get silero-vad v5 running with candle. I have a proof-of-concept branch working, but I'm trying to merge the changes in meaningful bits. features: -...

shua

python sentence transformer all-MiniLM-L6-v2 is almost 2x faster than candle

6

Here is my candle implementation: (Taken from the examples itself) `pub fn encode(&self, prompt: &str) -> Result { let tokens = self.tokenizer .encode(prompt, true) .map_err(E::msg)? .get_ids() .to_vec(); let token_ids =...

AbhishekBose

How to load multiple safetensors with json format

7

For such a task: https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/transformer how should safetensors be loaded?

oovm

gemma2:2b example

2

Hello Sir and Madam, Do you plan to add the gemma2:2b example? This model is very small and smart. Best regards, Evgeny

evgenyigumnov

Add support for argsort with no shared memory usage

2

As discussed in #2361, our current argsort implementation does not work on CUDA for large vectors because we use a bitonic sort implementation, which requires shared memory. For some n...

EricLBuehler

WebGPU support

30

Is WebGPU support on the roadmap as an alternative GPU-accelerated backend? This would be especially useful for inference on the web or for non-CUDA environments.

sluijs

Tensor::arg_sort_last_dim on CUDA does not work with large vectors

4

If the vector length is high, the error `CUDA_INVALID_VALUE` is returned: ```rust use candle_core::{DType, Device, Tensor}; fn main() { let a = Tensor::zeros( 32000, DType::F32, &Device::cuda_if_available(0).unwrap(), ) .unwrap(); dbg!(&a.arg_sort_last_dim(true)); }...

EricLBuehler

Unable to load Quantized mistral

1

I am able to load quantised_mistral For the model_id and revision I have chosen this ``` let model_id = "mistralai/Mistral-7B-v0.1".to_string(); let revision = "26bca36bde8333b5d7f72e9ed20ccda6a618af24".to_string(); ``` let filenames = hub_load_safetensors(&api_repo, safetensors_file_name)?;...

AbhishekBose

Add GGUF BF16 dtype support

Currently, the `GgmlDType` only supports F16 and not BF16. This PR introduces support for the BF16 type. I would appreciate a check if this looks good! I have tested with...

EricLBuehler

Avoid dtoh copy for dequantization of f16/f32

Currently, we execute a dtoh copy when dequantizing f16/f32 on CUDA when this is not necessary. We can just add a simple cast kernel to ensure that we keep the...

EricLBuehler

candle
candle copied to clipboard

Metadata

tracking: support silero-vad v5

python sentence transformer all-MiniLM-L6-v2 is almost 2x faster than candle

How to load multiple safetensors with json format

gemma2:2b example

Add support for argsort with no shared memory usage

WebGPU support

Tensor::arg_sort_last_dim on CUDA does not work with large vectors

Unable to load Quantized mistral

Add GGUF BF16 dtype support

Avoid dtoh copy for dequantization of f16/f32

← Metadata

Owner

Metadata

candle candle copied to clipboard

Metadata

← Metadata

Owner

Metadata

candle
candle copied to clipboard