candle
candle copied to clipboard
Minimalist ML framework for Rust
This ticket tracks my efforts to get silero-vad v5 running with candle. I have a proof-of-concept branch working, but I'm trying to merge the changes in meaningful bits. features: -...
Here is my candle implementation: (Taken from the examples itself) `pub fn encode(&self, prompt: &str) -> Result { let tokens = self.tokenizer .encode(prompt, true) .map_err(E::msg)? .get_ids() .to_vec(); let token_ids =...
For such a task: https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/transformer how should safetensors be loaded?
Hello Sir and Madam, Do you plan to add the gemma2:2b example? This model is very small and smart. Best regards, Evgeny
As discussed in #2361, our current argsort implementation does not work on CUDA for large vectors because we use a bitonic sort implementation, which requires shared memory. For some n...
Is WebGPU support on the roadmap as an alternative GPU-accelerated backend? This would be especially useful for inference on the web or for non-CUDA environments.
If the vector length is high, the error `CUDA_INVALID_VALUE` is returned: ```rust use candle_core::{DType, Device, Tensor}; fn main() { let a = Tensor::zeros( 32000, DType::F32, &Device::cuda_if_available(0).unwrap(), ) .unwrap(); dbg!(&a.arg_sort_last_dim(true)); }...
I am able to load quantised_mistral For the model_id and revision I have chosen this ``` let model_id = "mistralai/Mistral-7B-v0.1".to_string(); let revision = "26bca36bde8333b5d7f72e9ed20ccda6a618af24".to_string(); ``` let filenames = hub_load_safetensors(&api_repo, safetensors_file_name)?;...
Currently, the `GgmlDType` only supports F16 and not BF16. This PR introduces support for the BF16 type. I would appreciate a check if this looks good! I have tested with...
Currently, we execute a dtoh copy when dequantizing f16/f32 on CUDA when this is not necessary. We can just add a simple cast kernel to ensure that we keep the...