candle
candle copied to clipboard
Minimalist ML framework for Rust
This commit refactors the previously separate implementations of arithmetic operations (Add, Sub, Mul, Div) between f64 and Tensor types into a single, reusable macro `impl_f64_tensor_ops`.
Firstly thanks for candle, it's an impressive project and the speed at which updates are coming is really cool as a user! I'm trying to use a speedyspeech onnx model...
I only see that candle returns last_hidden_state, but not all_hidden_states and attentions. I want to get attentions. Can I submit a PR to do this? I originally wanted to define...
Hello, there appears to be a bug in the trocr example. Possibly with the way images are tokenized. For example, this image produces 754754.7 instead of 754.7 I have added...
Currently, `QTensor::quantize`: - Take a tensor, assume it is on the GPU for this example - Copies the data to the CPU - Quantizes on the CPU - Copies the...
Try to resolve https://github.com/huggingface/candle/issues/2294
outputs:Err(WithBacktrace { inner: Msg("unsupported op_type Split for op NodeProto { input: [\"/model.2/cv1/act/Mul_output_0\", \"onnx::Split_64\"], output: [\"/model.2/Split_output_0\", \"/model.2/Split_output_1\"], name: \"/model.2/Split\", op_type: \"Split\", domain: \"\", attribute: [AttributeProto { name: \"axis\", ref_attr_name: \"\", doc_string:...
I am trying to implement an adaptive avg pool in candle. However, I guess my implementation will require an API to get the raw data/storage (storaged in plain/flatten array format)....
the Qwen/Qwen2-1.5B can work correct in the example, but Qwen/Qwen2-7B can't. `(base) lyn@A100DEV:~/workspace/candle/candle-examples$ cargo run --release --features cuda --example qwen -- --model 2-7b --prompt "Hello\n" Finished release [optimized] target(s) in...
#### Very barebones implementation of Llama multinode for distributed inference - Adds support for running Llama model inference across multiple nodes and GPUs. - a simple tcp server to exchange...