candle
candle copied to clipboard
Minimalist ML framework for Rust
`cargo run --features metal --example gemma -- --which code-7b-it --prompt "explain isakmpd's architecture"` fails with: ``` retrieved the files in 27.197292ms loaded the model in 36.859128625s explain isakmpd's architectureError: Metal...
Leaving as a draft to solicit feedback on the introduction of this package. During my research on WebGPU it seemed like this was a pretty common util being used in...
Return a Cow from to_cpu_storage to avoid unnecessary copy. Address one of the issues in https://github.com/huggingface/candle/issues/1699
It seems that clearing cache on current Falcon model implementation is currently not working properly. Every time a second query is run, the cache is not cleared.
This PR is setting up the metal backend for the change proposed in this PR: https://github.com/huggingface/candle/pull/2037 The goal here is to have no actual change at runtime for this diff,...
During our benchmark testing, we noticed that the Candle backend for Burn was finishing up quickly for the Metal device. Upon closer inspection, we have discovered that `wait_until_completed` is not...
Hi, we are researchers from [Sunlab](https://sunlab-gmu.github.io/). When we tried to scan Rust-based repositories with our own implemented bug detectors, we found that there are some potentially unsound usages of `slice::from_raw_parts`...
quantized compiled using --> cargo build --example quantized -r --features metal Unsure of... how many layers accelerated / how many threads used / clearly different sample stages ..yet I presume...
It is mentioned on README that candle supports multi GPU inference, using NCCL under the hood. How can this be implemented ? I wonder if there is any available example...