gpt4all
gpt4all copied to clipboard
GUI won't start on Windows (unhandled exception in ggml_vk_available_devices)
System Info
Hi, I'm running GPT4All on Windows Server 2022 Standard, AMD EPYC 7313 16-Core Processor at 3GHz, 30GB of RAM. This computer also happens to have an A100, I'm hoping the issue is not there! GPT4All was working fine until the other day, when I updated to version 2.4.9 and all of a sudden it wouldn't start. No feedback whatsoever, it just doesn't start. I've downloaded the 2.5 pre-release today but I'm still having the same issue. Here's the event viewer record detail: Error GPT4All pre-release.txt
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [ ] backend
- [ ] bindings
- [ ] python-bindings
- [X] chat-ui
- [ ] models
- [ ] circleci
- [ ] docker
- [ ] api
Reproduction
GPT4All just doesn't start, even with admin privileges granted.
Expected behavior
Should start!
It would be really helpful if you could build GPT4All from source in Debug mode, and run it under either the Visual Studio debugger, or windbg, in order to get the call stack. Unfortunately, the binaries we publish are stripped Release builds with very little information to assist debugging.
That won't be easy. I'm not much of a developer, and cpp is not among the languages I know well. Also, I have security constraints, imposed by my enterprise, to install/run third party's code (I had to ask permission and wait for a week just to have the program installed). All in all, I don't see myself doing that. Any volunteers?
You mean 2.4.19 not 2.4.9, right?
First of all, one thing you can try is rename your settings file, which is located at C:\Users\<name>\AppData\Roaming\nomic.ai\GPT4All.ini. Try giving it a different extension (so you have it backed up). A new one with default values will be created automatically the next time you start GPT4All.
If that doesn't help, you can also try adding a line device=CPU to the General section, or change the line if device= already exists there, e.g.:
[General]
device=CPU
...
Close the program before you do that and restart it afterwards.
You mean 2.4.19 not 2.4.9, right?
Yes, sorry, already updated the issue title.
First of all, one thing you can try is rename your settings file, which is located at
C:\Users\<name>\AppData\Roaming\nomic.ai\GPT4All.ini. Try giving it a different extension (so you have it backed up). A new one with default values will be created automatically the next time you start GPT4All.
Changed the extension, no success: GPT4All still won't start.
If that doesn't help, you can also try adding a line
device=CPUto theGeneralsection, or change the line ifdevice=already exists there, e.g.:[General] device=CPU ...Close the program before you do that and restart it afterwards. I didn't need to close the program for obvious reasons. Added the device configuration, still won't start :(
I uploaded a debug build of the installer to the releases page, it's called gpt4all-installer-win64-v2.5.0-pre1-debug.exe. If you install that, the output of Event Viewer will at least have some meaning to us. windbg would be even better:
- Download the Windows SDK
- Install it, clearing all checkboxes except for "Debugging Tools for Windows", which is the only one you would need
- Start WinDbg (X64)
- File > Open Executable, navigate to C:\Program Files\gpt4all\bin\chat.exe
- If it stops at ntdll!LdrpDoDebuggerBreak, press the F5 key to continue
- If it stops again, go to View > Call Stack, which will hopefully have useful information about the crash
Here's the result of following your instructions with Windbg:
Here's the result of following your instructions with Windbg:
Can you continue past that with F5? I think that's just another bug in Windows breakpoint handling, not an actual issue with the code. You should be able to continue until you get a call stack with lines other than ntdll!... in it.
Hope this is what you need:
Hope this is what you need:
Yes, that is very helpful, thanks.
edit: Could you please try to get info for the exception by running the .exr -1 command after windbg stops at that point?
Sure thing, here it goes:
Unfortunately, I'm not sure how to get the exception message with WinDbg. Here's another option:
I uploaded a console-enabled build (gpt4all-installer-win64-v2.5.0-pre2-debug-console.exe ) to the pre-release.
It would be helpful if you could start chat.exe via the command line - install that version, use "Open File Location" on the shortcut to find chat.exe, shift-right-click in the folder and open a powershell or command prompt there, and run .\chat (powershell) or chat (command prompt).
If there is any console output, please post it here.
Morning!
Got this:
I?m afraid all three options result in the process stopping without further message:
So, are we out of luck, @cebtenzzre ?
Unless you can debug it with Visual Studio (which I know will provide the exception information), I'm not sure what else to do.
Just a suggestion for debugging this. What about using procdump (from Microsoft) to help capture the stack trace. Something like: procdump -mm -x . chat.exe (assuming procdump v11 and that it's in the current path). The -mm switch is the minidump format, captures the basic process details. You can use something like WinDbg (and other tools) to debug it. Again, just a thought to help capture the instant it crashes.
@H4CKS4F3 , WinDbg was already used, if you read back a little. I gave a try to procdump, here are the two files, first one with -mm and, since I couldn't see a thing in there, the second one without the minidump parameter. dump.dmp dump2.dmp
@ADD-eNavarro run the following and attach the dump. Since procdump defaults to not dump on unhandled exceptions, it lost the actual exception in the minidump. procdump -mm -e -x . chat.exe
Here's the result of that last procdump run: dump3_231109_073726.dmp
Now we're getting somewhere:
KERNELBASE!RaiseException+6c
VCRUNTIME140!_CxxThrowException+90 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75] D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75
llmodel+ba4dc
0x0000002f`b14fd2b8
Unfortunately, I no longer have a copy of the debug info for that build of GPT4All, so I can't resolve llmodel+ba4dc to anything specific.
Here is a newer build that you can install and run the same procdump command on: gpt4all-installer-win64-v2.5.2.r8.gd4ce9f4-debug-console.exe
I'll keep that build tree in a separate folder so I'll be able to debug it when you reply.
New dump: dump4_231110_094643.dmp
Here is the call stack when the exception is thrown:
KERNELBASE!RaiseException+0x6c
VCRUNTIME140D!_CxxThrowException+0x120
llmodel!vk::detail::throwResultException+0x29c
llmodel!vk::resultCheck+0x23
llmodel!vk::Instance::enumeratePhysicalDevices<std::allocator<vk::PhysicalDevice>,vk::DispatchLoaderDynamic>+0xf7
llmodel!kp::Manager::listDevices+0x38
llmodel!ggml_vk_available_devices+0xf6
llmodel!LLModel::availableGPUDevices+0x4f
chat!MySettings::MySettings+0x74
chat!MyPrivateSettings::MyPrivateSettings+0x14
chat!`anonymous namespace'::Q_QGS_settingsInstance::innerFunction+0x36
chat!QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance>::Holder<`anonymous namespace'::Q_QGS_settingsInstance>+0x1c
chat!QGlobalStatic<QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance> >::instance+0x4c
chat!QGlobalStatic<QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance> >::operator()+0x24
chat!MySettings::globalInstance+0x12
chat!main+0x12f
chat!invoke_main+0x39
chat!__scrt_common_main_seh+0x12e
chat!__scrt_common_main+0xe
chat!mainCRTStartup+0xe
kernel32!BaseThreadInitThunk+0x10
ntdll!RtlUserThreadStart+0x2b
It's caused by VK_ERROR_DEVICE_LOST:
So it looks like we need to catch Vulkan exceptions from komputeManager()->listDevices() and ignore them. It seems like there is some issue with your GPU driver that prevents Vulkan from being used.
Anything I can do then?
From my perspective, unless you can suggest a patch, looks like you'll need to wait for the developers to do something. One thing I'd suggest is updating drivers, since this seems to be a driver issue. I actually was suffering from this issue too, but "something changed" and it started working again. Maybe I updated drivers, but I can't be certain. I have NVIDIA card, so I may have updated the driver + CUDA.
Following @H4CKS4F3 advice, we've updated the CUDA to version 12.3.1, which updated NVidia drivers from 545.84 to 546.12. Other changes that came along were: Nsight Compute, 2023.3.1 -> 2023.3.1 Nsight Visual Studio Edition, 2023.3.0.23xxx -> 2023.3.1.23311
But GPT4All still doesn't start. So maybe it's not the drivers.
i had exactly this problem ... and solved it .. deactivatoin of Antivirus is NOT enough ... you need to reinstall .. here my text from another post I solved my own problem / issue:
hey .. just an update for Windows users:
The reason ... why on windows chat.exe is opening in the task manager only but not opening the GUI seems to be an interference with AVG Antivirus software. After uninstalling it gpt4all version 2.8 pre opened with CUDA and everything!!
I made a discord post in the Gpt4all channel here: https://discord.com/channels/1076964370942267462/1090651132390543400/1242561840056111114
thus uninstall antivirus software ... run chat.exe again ... then reinstall antivirus .... at least in my case this is the solution
i had exactly this problem
Different issue. OP experienced a crash caused by a bad interaction with a non-functional Vulkan driver.