llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Phi architecture broken on arm64 in cpu mode

Open walter-cavinaw opened this issue 1 year ago • 3 comments

I noticed that llamafile with models using newer quantization schemes (_K, _K_M) don't work on certain machines.

Llama.cpp has pushed a fix that is worth adopting:

https://github.com/ggerganov/llama.cpp/pull/4630

walter-cavinaw avatar Jan 02 '24 01:01 walter-cavinaw

Thanks for bringing it to my attention. I'm skeptical. Our build system targets ARMv8.0 so __ARM_FEATURE_DOTPROD is never defined (unless you override our build config). I've used models like rocket-3b.Q3_K_M.gguf just fine on devices like Raspberry Pi, which do not have the dotprod feature. Could you clarify how you're encountering this issue with llamafile?

jart avatar Jan 02 '24 02:01 jart

What I noticed was phi-2.Q4_0.llamafile was working on my raspberry pi, and phi-2.Q4_K_M.llamafile was producing gibberish. I went looking for an answer why the newer schemes didn't work properly. That seemed like the most probable answer, but I understand what you're saying.

Does that Q4_K_M llamafile work for you? I already matched my os to yours (to fix another bug), so perhaps it's just that I am missing some OS level config or env variable, as unlikely as that seems?

walter-cavinaw avatar Jan 02 '24 05:01 walter-cavinaw

I took a closer look and:

  • Metal GPU - OK
  • Nvidia GPU - OK
  • AMD64 CPU - OK
  • ARM64 CPU - BUSTED

Phi support was added due to popular request for a cherry pick. We normally do a full sync with llama.cpp upstream once a month. What probably happened is we need some other commit too. The problem will fix itself soon when the next sync happens. If anyone can help us pinpoint the cause (which is probably something in ggml.c rather than ggml-quants.c) then I'll cherry-pick that asap.

jart avatar Jan 02 '24 06:01 jart

@walter-cavinaw You were absolutely correct. It was the dotprod quant refactor that fixed things. I'll have a fix pushed and a new release soon!

jart avatar Jan 05 '24 06:01 jart