candle issues

Extract RotaryEmbedding code for reuse across models.

1

Most models use identical of almost identical copies of RotaryEmbedding (cfg.rope_theta vs hardcoded 10000, rope_theta being f32 or f64, chunk() vs 2 calls to narrow() ). A few others (mixformer,...

janimo

Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16

2

Win11 Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16

jefftantan

Falcon example seems broken (on metal)

3

I am trying to run falcon locally on my machine, on main branch, through: `cargo run --release --features metal --example falcon -- --prompt "write a hello world rust program"` which...

jorgeantonio21

Running models with different precisions

8

I am testing different model architectures, and when loading the model weights (e.g. for falcon or mamba architectures) with precision either `bf16` or `f16` I usually get this error: `Candle...

jorgeantonio21

Cannot run examples with --features cuda option

54

CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"` note: to keep the current resolver, specify `workspace.resolver...

dbrowne

Metal iOS

5

Great framework! Is the usage of Metal already possible on iOS? I'm trying to run the Phi example on iOS and I can only get it to work with a...

soldelacroix

Support for tensors with 0-length dimensions

Operations on tensors with zero-length dimensions are supported in other libraries such as PyTorch, and would be nice to have support for here. For example, when I multiply a 0-by-K...

michaeleisel

The output diverges in comparison to the Python implementation.

5

I've noticed that the generation diverges after some tokens in comparison to the HF implementation. Is this expected? Here's how to reproduce: **Transformers** ```python import torch from transformers import AutoTokenizer,...

hugoabonizio

Extreme slow inference speed on CPU when trying blip example

4

I have followed the tutorial and set up my first rust example. However, I found that the inference speed is faster compared to torch on GPU (780ms per image vs...

xcmgttacct

Could someone please explain why this is happening? (batcher.rs seq_len:4294967040)

1

Hello! i am a students in korea. trying to make trainging code for llm , i encounted some problem. my code referencing "llama2-c > training.rs" code, they use like this....

dotori1995

candle
candle copied to clipboard

Metadata

Extract RotaryEmbedding code for reuse across models.

Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16

Falcon example seems broken (on metal)

Running models with different precisions

Cannot run examples with --features cuda option

Metal iOS

Support for tensors with 0-length dimensions

The output diverges in comparison to the Python implementation.

Extreme slow inference speed on CPU when trying blip example

Could someone please explain why this is happening? (batcher.rs seq_len:4294967040)

← Metadata

Owner

Metadata

candle candle copied to clipboard

Metadata

← Metadata

Owner

Metadata

candle
candle copied to clipboard