candle
candle copied to clipboard
M1 process stuck while trying to run mixtal example
I tried Mixtral on my 128GB M1 Mac.
The process state flips between running and stuck.
The cli output from the example is not progressing past this poing:
Running `target/release/examples/mixtral --prompt 'def print_prime(n): '`
avx: false, neon: true, simd128: false, f16c: false
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 83.206834ms
loaded the model in 152.927567834s
def print_prime(n
I started the process with the command from an up to date repo:
cargo run --features metal --example mixtral --release -- --prompt "def print_prime(n): "
Actually using mixtral on metal was falling back to f32 as we didn't have bf16 support for matmul on gemm. So this would require > 200GB of memory to work. @ivarflakstad just added support for bf16 matmul in #2364 so you can give it another try with the changes in #2378 (cannot really try it as my mac is not beefy enough). If that works well, we'll generalize that to other bf16 based models.
My system is using DType::F32 from device.bf16_default_to_f32().
Can I provide any useful info for debugging?
Had to revert because while it was correct on M3 it was not on M1/M2. I’m looking into it :)
This should be solved now, and likely has been solved for quite a while.