llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Apple Studio m1 max out of memory?

Open JasonOSX opened this issue 2 years ago • 3 comments
trafficstars

i experience memory/loading issue on m1 max studio with loading 30b 65b models with metal. it look like it has reached memory limit but i have enough of it. 7b and 13b work okay!

ps - sorry for my bad enlglish i am not native speaker

JasonOSX avatar Jun 16 '23 11:06 JasonOSX

It is not solved yet. You can actively follow this PR, where the issue is being investigated: https://github.com/ggerganov/llama.cpp/pull/1826

ymcui avatar Jun 16 '23 11:06 ymcui

It is not solved yet. You can actively follow this PR, where the issue is being investigated: #1826

thank you :)

JasonOSX avatar Jun 16 '23 11:06 JasonOSX

Just tried the fresh version of python binding with the workaround PR merged, it crashes with a 30b model when n_gpu_layers>0

ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: /private/var/folders/zk/hd0v0z2910x13xv8hq213c600000gn/T/pip-install-pcxrvank/llama-cpp-python_c6c616faeeea45d696c56e85205281f2/vendor/llama.cpp/ggml-metal.m:969: false

UPD: Looks like this only happens with n_ctx = 2048 on a 32gb macbook, lower n_ctx values (<=1536) work fine.

remixer-dec avatar Jun 20 '23 20:06 remixer-dec

Hope helps!

my env : apple m2 pro 16gb model : llama-2-13b-chat.ggmlv3.q4_0.bin

I met same problem " failed with status 5", and I change n_ctx from 4096 to 2000, then it works fine.

image

linghunjiushu avatar Aug 07 '23 09:08 linghunjiushu

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 10 '24 01:04 github-actions[bot]