alpaca-electron
alpaca-electron copied to clipboard
Performance is much worse than in llama.cpp on Apple Silicon (M1/M2)
Inference in general is much slower than with llama.cpp on my M2 Pro processor using any model (I tried Vicuna 13b, llama 13b, and alpaca 13b).
I would say approximately 3x slower, all things being equal. I don't have exact timings but it's obvious without counting the seconds.
I think the MacOS binaries were built just a day before the Apple ARM64 chip optimization was merged into llama.cpp. If you want, you can send me the new llama.cpp binaries (just main is ok) and I can update the app.
i just sent you the files on discord! :)
Awesome -- I was having trouble figuring out how to attach here, haha.
i just sent you the files on discord! :)
Thank you.
Are you able to get it working at all with the M2 Pro? Mine just permanently hangs after prompting.
Yes, I too have an M2 Pro (32gb ram) and it works but just very very slowly. Even 30b loaded but I mostly tried 13b.
Interesting. I have a M2 Max with 64gb and I can't even get the 7b to work.
I am using Mac Mini M2 32 GB, the 7B model is the best so far and pretty smart. The 30B is too slow.
I wish it could be at the level of ChatGPT that can generate Blender Python.
Ok everyone, i just updated the v1.0.5 macOS ARM64 release. Try downloading and installing it again. It should be faster now.
Works amazing now!
Sent from Proton Mail for iOS
Works great! Thanks for the change. Only problem now is
bash: Would: command not found
bash-3.2$
After clearing chat, but I'm assuming that's unrelated to the arm64
Works great! Thanks for the change. Only problem now is
bash: Would: command not found bash-3.2$
After clearing chat, but I'm assuming that's unrelated to the arm64
Yeah that's unrelated to arm64 MacOS build. It seems in the new llama.cpp, something happened and ^C also exits out of llama.cpp on Linux and MacOS. I didn't add code to check for it on MacOS and Linux.