Jorge António
Jorge António
After running multiple times the command: ``` cargo run --release --features metal --example phi -- --model 3 --prompt "The best thing about coding in rust is " ``` I realized...
After trying llama example with either `cuda` or `flash-attn` features, I realized the generation times are quite similar. I would expect flash attention to have a significant improvement in the...
It seems that clearing cache on current Falcon model implementation is currently not working properly. Every time a second query is run, the cache is not cleared.
It is mentioned on README that candle supports multi GPU inference, using NCCL under the hood. How can this be implemented ? I wonder if there is any available example...
I am trying to run falcon locally on my machine, on main branch, through: `cargo run --release --features metal --example falcon -- --prompt "write a hello world rust program"` which...
I am testing different model architectures, and when loading the model weights (e.g. for falcon or mamba architectures) with precision either `bf16` or `f16` I usually get this error: `Candle...
I have been experimenting with the Luminal examples on my Macbook M3, and every time I run ```sh cargo run --release --features metal ``` I get a non-sensical answer, like:...
Currently, I can't extract an output by running the phi3 example: ``` % cargo run --release --features metal \ Finished release [optimized] target(s) in 0.27s Running `/Users/jorgeantonio/dev/luminal/target/release/phi` Defining graph -...
Running the command: ```cargo run --example mamba --release --features metal -- --prompt "Tell me a joke please" --dtype f16``` does not work. The problem seems to lie in code: ```rs...