llama.cpp
llama.cpp copied to clipboard
Apple Studio m1 max out of memory?
i experience memory/loading issue on m1 max studio with loading 30b 65b models with metal. it look like it has reached memory limit but i have enough of it. 7b and 13b work okay!
ps - sorry for my bad enlglish i am not native speaker
It is not solved yet. You can actively follow this PR, where the issue is being investigated: https://github.com/ggerganov/llama.cpp/pull/1826
It is not solved yet. You can actively follow this PR, where the issue is being investigated: #1826
thank you :)
Just tried the fresh version of python binding with the workaround PR merged, it crashes with a 30b model when n_gpu_layers>0
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: /private/var/folders/zk/hd0v0z2910x13xv8hq213c600000gn/T/pip-install-pcxrvank/llama-cpp-python_c6c616faeeea45d696c56e85205281f2/vendor/llama.cpp/ggml-metal.m:969: false
UPD: Looks like this only happens with n_ctx = 2048 on a 32gb macbook, lower n_ctx values (<=1536) work fine.
Hope helps!
my env : apple m2 pro 16gb model : llama-2-13b-chat.ggmlv3.q4_0.bin
I met same problem " failed with status 5", and I change n_ctx from 4096 to 2000, then it works fine.
This issue was closed because it has been inactive for 14 days since being marked as stale.