gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

[M1 Mac] Chat crashes when using Metal Device and Model too large

Open GrimmiMeloni opened this issue 1 year ago • 2 comments
trafficstars

Bug Report

Trying to chat with the TheBloke's Mixtral, I am seeing immediate crashes when starting the chat. It does not even get to the stage where it loads the model. The workaround is to explicitly pick CPU in the Device Settings dialog. Resetting back to "Application Default" (or explicitly setting to Metal) reproduces the crash 100% of the time.

Edit 2024-09-09: Further experiments confirmed this is related to the model size. In my case I am running into VRAM limitations. For the crash to happen a model larger than the maximum unified memory assigned to GPU must be used.

Steps to Reproduce

  1. Leave default settings (after fresh install)
  2. Install the Mixtral Model
  3. Start a Chat
  4. GPT 4 ALL crashes
Thread 13 Crashed:: 03c9a3a3-e3fb-4768-acf5-76d976e5ecfc

0   ???                           	               0x0 ???
1   libllamamodel-mainline-metal.dylib	       0x1185f295c llm_load_tensors(llama_model_loader&, llama_model&, int, llama_split_mode, int, float const*, bool, bool (*)(float, void*), void*) + 158428
2   libllamamodel-mainline-metal.dylib	       0x1185aad64 llama_load_model_from_file + 4960
3   libllamamodel-mainline-metal.dylib	       0x118504e80 LLamaModel::loadModel(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, int) + 440
4   gpt4all                       	       0x104ac8a30 ChatLLM::loadNewModel(ModelInfo const&, QMap<QString, QVariant>&) + 2188
5   gpt4all                       	       0x104ac67f0 ChatLLM::loadModel(ModelInfo const&) + 1896
6   QtCore                        	       0x10855dec8 QObject::event(QEvent*) + 612
7   QtCore                        	       0x10851c408 QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) + 384
8   QtCore                        	       0x10851bf88 QCoreApplication::notifyInternal2(QObject*, QEvent*) + 292
9   QtCore                        	       0x10851d238 QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) + 1428
10  QtCore                        	       0x108687700 QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) + 84
11  QtCore                        	       0x1085258fc QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) + 532
12  QtCore                        	       0x10860f01c QThread::exec() + 280
13  QtCore                        	       0x10868b8dc 0x1084b0000 + 1947868
14  libsystem_pthread.dylib       	       0x198121f94 _pthread_start + 136
15  libsystem_pthread.dylib       	       0x19811cd34 thread_start + 8

Expected Behavior

I would expect that setting Device to Metal either directly or indirectly should work, or at least fallback to CPU gracefully.

Your Environment

  • GPT4All version: 3.2.1
  • Operating System: MacOS 14.6.1 (23G93)
  • System: MacBook Pro M1 (2021), 32GB RAM
  • Chat model used (if applicable): https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF

GrimmiMeloni avatar Sep 05 '24 09:09 GrimmiMeloni

You are most likely running out of VRAM, which uses a portion of that 32GB of RAM. The tell is failing in load tensors -i.e. loading a model that is normally in the range of 24GB of data plus needing to set aside room for context. However, by default I believe it is 2/3 memory can be used for VRAM - i.e. 20gb in your case. So you are running out of VRAM fairly immediately. It works on CPU because you have the whole 32GB available (minus system overhead).

chrisbarrera avatar Sep 05 '24 14:09 chrisbarrera

@chrisbarrera you are right. The crash seems to be related to model size. I tried smaller 10GB model and confirmed this also works on Metal.

I will update the issue accordingly.

GrimmiMeloni avatar Sep 09 '24 06:09 GrimmiMeloni