gpt4all
gpt4all copied to clipboard
[M1 Mac] Chat crashes when using Metal Device and Model too large
Bug Report
Trying to chat with the TheBloke's Mixtral, I am seeing immediate crashes when starting the chat. It does not even get to the stage where it loads the model. The workaround is to explicitly pick CPU in the Device Settings dialog. Resetting back to "Application Default" (or explicitly setting to Metal) reproduces the crash 100% of the time.
Edit 2024-09-09: Further experiments confirmed this is related to the model size. In my case I am running into VRAM limitations. For the crash to happen a model larger than the maximum unified memory assigned to GPU must be used.
Steps to Reproduce
- Leave default settings (after fresh install)
- Install the Mixtral Model
- Start a Chat
- GPT 4 ALL crashes
Thread 13 Crashed:: 03c9a3a3-e3fb-4768-acf5-76d976e5ecfc
0 ??? 0x0 ???
1 libllamamodel-mainline-metal.dylib 0x1185f295c llm_load_tensors(llama_model_loader&, llama_model&, int, llama_split_mode, int, float const*, bool, bool (*)(float, void*), void*) + 158428
2 libllamamodel-mainline-metal.dylib 0x1185aad64 llama_load_model_from_file + 4960
3 libllamamodel-mainline-metal.dylib 0x118504e80 LLamaModel::loadModel(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, int, int) + 440
4 gpt4all 0x104ac8a30 ChatLLM::loadNewModel(ModelInfo const&, QMap<QString, QVariant>&) + 2188
5 gpt4all 0x104ac67f0 ChatLLM::loadModel(ModelInfo const&) + 1896
6 QtCore 0x10855dec8 QObject::event(QEvent*) + 612
7 QtCore 0x10851c408 QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) + 384
8 QtCore 0x10851bf88 QCoreApplication::notifyInternal2(QObject*, QEvent*) + 292
9 QtCore 0x10851d238 QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) + 1428
10 QtCore 0x108687700 QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) + 84
11 QtCore 0x1085258fc QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) + 532
12 QtCore 0x10860f01c QThread::exec() + 280
13 QtCore 0x10868b8dc 0x1084b0000 + 1947868
14 libsystem_pthread.dylib 0x198121f94 _pthread_start + 136
15 libsystem_pthread.dylib 0x19811cd34 thread_start + 8
Expected Behavior
I would expect that setting Device to Metal either directly or indirectly should work, or at least fallback to CPU gracefully.
Your Environment
- GPT4All version: 3.2.1
- Operating System: MacOS 14.6.1 (23G93)
- System: MacBook Pro M1 (2021), 32GB RAM
- Chat model used (if applicable): https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF
You are most likely running out of VRAM, which uses a portion of that 32GB of RAM. The tell is failing in load tensors -i.e. loading a model that is normally in the range of 24GB of data plus needing to set aside room for context. However, by default I believe it is 2/3 memory can be used for VRAM - i.e. 20gb in your case. So you are running out of VRAM fairly immediately. It works on CPU because you have the whole 32GB available (minus system overhead).
@chrisbarrera you are right. The crash seems to be related to model size. I tried smaller 10GB model and confirmed this also works on Metal.
I will update the issue accordingly.