Jorge António issues

Results 13 issues of


                                            Jorge António

Phi-3 implementation seems to be buggy on metal devices

After running multiple times the command: ``` cargo run --release --features metal --example phi -- --model 3 --prompt "The best thing about coding in rust is " ``` I realized...

flash attention does not yield speed gains on llama example

After trying llama example with either `cuda` or `flash-attn` features, I realized the generation times are quite similar. I would expect flash attention to have a significant improvement in the...

Falcon implementation issues

It seems that clearing cache on current Falcon model implementation is currently not working properly. Every time a second query is run, the cache is not cleared.

How to run inference of a (very) large model across mulitple GPUs ?

It is mentioned on README that candle supports multi GPU inference, using NCCL under the hood. How can this be implemented ? I wonder if there is any available example...

Falcon example seems broken (on metal)

I am trying to run falcon locally on my machine, on main branch, through: `cargo run --release --features metal --example falcon -- --prompt "write a hello world rust program"` which...

Running models with different precisions

I am testing different model architectures, and when loading the model weights (e.g. for falcon or mamba architectures) with precision either `bf16` or `f16` I usually get this error: `Candle...

Llama example bug on M3

I have been experimenting with the Luminal examples on my Macbook M3, and every time I run ```sh cargo run --release --features metal ``` I get a non-sensical answer, like:...

Phi model does not produce output on M3

Currently, I can't extract an output by running the phi3 example: ``` % cargo run --release --features metal \ Finished release [optimized] target(s) in 0.27s Running `/Users/jorgeantonio/dev/luminal/target/release/phi` Defining graph -...

Catch minor typo

Mamba model is broken with `f16` precision

Running the command: ```cargo run --example mamba --release --features metal -- --prompt "Tell me a joke please" --dtype f16``` does not work. The problem seems to lie in code: ```rs...