luminal
luminal copied to clipboard
Phi model does not produce output on M3
Currently, I can't extract an output by running the phi3 example:
% cargo run --release --features metal
\ Finished release [optimized] target(s) in 0.27s
Running `/Users/jorgeantonio/dev/luminal/target/release/phi`
Defining graph - 75ms
Compiling graph - 4799ms
Loading model - 3544ms
Processing Prompt - 183ms (71.04 tok/s, 13 prompt tokens)
<|user|>
Please write me a python implementation of merge sort<|end|>
<|assistant|>
Average token generated in 46.66ms - (21.43 tok/s)
This issue is related to #51
Does this still happen if you pull main branch? I believe for others this has been fixed. It may be the same issue with M3 that llama is facing
I'm fairly certian the problem is the softmax kernel producing inf on your machine, which makes the logits come out NaN, and triggers the blank token to be outputted, which is why you see no output at all. I will be revisiting the softmax kernel today or tomorrow to fix this
I pulled the main branch right now, and the problem persists.
Thank you so much @jafioti !
yes comment SoftmaxCompiler in luminal_metal lib.rs and Phi (and Llama) example will work on M3
@mikeseven Does it give proper outputs? In the other issue you mentioned it gives bad outputs
Sorry for the confusion. I wanted to say that the output looks correct but not as good as with llama. It looks to me a model accuracy issue.
Ok I'll close this for now then, thanks