llama.cpp
llama.cpp copied to clipboard
Metal issues with Intel Mac
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Run server properly.
Current Behavior
After launching the command, macOS GUI frozen.
./server -m path-to-gguf/some-gguf-previously-worked
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
-
Physical (or virtual) hardware you are using, e.g. for Linux: macbook pro 2022
$ lscpu4 core intel core i5 -
Operating System, e.g. for Linux: macOS 13.5.1
$ uname -a -
SDK version, e.g. for Linux: Xcode 15.0.1 15A507
$ python3 --version
$ make --version
$ g++ --version
Failure Information (for bugs)
The GUI froze after loading metal file.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- build from git source
- run
./server -m path-to-ggufThe screen output suggests: ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics ggml_metal_init: picking default device: Intel(R) Iris(TM) Plus Graphics ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: loading '....../llama.cpp/ggml-metal.metal'
Then the GUI froze. After about 2 minutes, the watchdog was triggered and the OS panic occurred.
Tested with a smaller weight, e.g. llama-2-7b-chat.Q4_0.gguf from TheBloke/llama2-7b-chat-gguf.
With -ngl 0 and no metal acceleration, server output was normal and reasonable.
With -ngl 1 or higher, the output would be abnormal:
User: hello
Llama: hew$omm friosingahnattle Chefattrtcipeahn stackozommotrлю stack numerura Matth Fri friowyazi Stanis Esp stack stackahn Mos Basicovi immugenemblyabormosaga stackommtags attackuraahnovyommahnahnowyelsk Fra Chetags ub Grundfileowy Domwohlprevioustagsowyahnotrowy openahnéraahn Cheuraватиahnfattcipeosing stacklem Culturartahnahnahn numer Fland Matthmodel Stanisazi Fland friowy Stanis Fland Stanisowyazi Stanis$tags Stanis Flandabor Mos stack Stanisattle rag Espowyomm Stanisaga Cheelsk Fri model stack Matthaziovi Basicugen Stanisowyembly stackowy dispotr Matth Modelahnhagenhewії stackлю Fland stack宝owy Staniscipeazi Stanis Cheahnosing Stanis numer Matthfattrt Fland Che Stanisowy Cheijkuratags Matth Che Stanisowyazi Mos Culturaowy Cheowytags$ Basicabor Esptags Stanis ub Comics Cheelsk Eb Che Che openommowy Cheaga Che savotrtagstagséquipe Che mort Matthprevious Alliancehagenahn Cheague Stanis stack model Che ind Matthosing Che rag Cheлюcipefatt Flandugentagsovyovi Stanis Stanis Chert Fri Stanisazimodelijk Stanis Stanisosing看$ Cheowy Matth Stanisowy Cheowyabor Mos Che Che Che Matth Che Stanis Matth Stanis$ Cheowy Che Espіїahn Stanis$ Stanis ub Stanisotr宝 Modelopen stackabor Basic· Culturaelsk Dom numer fri Stanis ragowy Eb Fland fre Stanisemblyovi Fland rfattrt Che model Stanis Che Che Che Stanis Fland Che Stanisazi Cheosingcipe mort Fland Chetags Stanis Fri proced Chelem FlandowyCheowyrt Che Matth Stanis Espahn Cheazi Che Mosijk Che Stanisarian Fland Che Fland stackopen ubahn宝 open Stanisahn Chetagsrt numer Basicowyowy Cheaborлюahn numerahnowy Stanis numerelsk rag Stanisotr Che*$ Che Che le Stanis Che Cheowy freahn Che宝ahn Che Fland mort Stanis Eb Basic Basic Friahn Flandowyместosing Matth Esp Domtags
I have the same issue.
Tested on 6800XT or RX 580 on my Intel Mac and I have the same issue.
I also have the same issue, and i've tried OpenCL mode and get gibberish for anything over ngl=1. My setup is Intel mac with AMD GPU.
I have the same issue with a 2017 13-inch MBP (2,3 GHz Dual-Core Intel Core i5, Intel Iris Plus Graphics 640 1536 MB, 8 GB 2133 MHz LPDDR3, macOS Ventura 13.6.4).
Using a smaller quantisation (Q2_K), instead of a UI freeze I get an endless string of ggml_metal_graph_compute: command buffer 0 failed with status 5 errors. According to what someone said in this Reddit comment, and unless I'm mistaken, this might be a memory issue. My recommendedMaxWorkingSetSize is 1610.61 MB, which means even 1 layer of the smallest 7B quantisation is not going to fit on my computer using Metal.
This issue was closed because it has been inactive for 14 days since being marked as stale.