gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

GUI won't start on Windows (unhandled exception in ggml_vk_available_devices)

Open ADD-eNavarro opened this issue 10 months ago • 24 comments

System Info

Hi, I'm running GPT4All on Windows Server 2022 Standard, AMD EPYC 7313 16-Core Processor at 3GHz, 30GB of RAM. This computer also happens to have an A100, I'm hoping the issue is not there! GPT4All was working fine until the other day, when I updated to version 2.4.9 and all of a sudden it wouldn't start. No feedback whatsoever, it just doesn't start. I've downloaded the 2.5 pre-release today but I'm still having the same issue. Here's the event viewer record detail: Error GPT4All pre-release.txt

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] backend
  • [ ] bindings
  • [ ] python-bindings
  • [X] chat-ui
  • [ ] models
  • [ ] circleci
  • [ ] docker
  • [ ] api

Reproduction

GPT4All just doesn't start, even with admin privileges granted.

Expected behavior

Should start!

ADD-eNavarro avatar Oct 06 '23 06:10 ADD-eNavarro

It would be really helpful if you could build GPT4All from source in Debug mode, and run it under either the Visual Studio debugger, or windbg, in order to get the call stack. Unfortunately, the binaries we publish are stripped Release builds with very little information to assist debugging.

cebtenzzre avatar Oct 06 '23 14:10 cebtenzzre

That won't be easy. I'm not much of a developer, and cpp is not among the languages I know well. Also, I have security constraints, imposed by my enterprise, to install/run third party's code (I had to ask permission and wait for a week just to have the program installed). All in all, I don't see myself doing that. Any volunteers?

ADD-eNavarro avatar Oct 10 '23 13:10 ADD-eNavarro

You mean 2.4.19 not 2.4.9, right?

First of all, one thing you can try is rename your settings file, which is located at C:\Users\<name>\AppData\Roaming\nomic.ai\GPT4All.ini. Try giving it a different extension (so you have it backed up). A new one with default values will be created automatically the next time you start GPT4All.

If that doesn't help, you can also try adding a line device=CPU to the General section, or change the line if device= already exists there, e.g.:

[General]
device=CPU
...

Close the program before you do that and restart it afterwards.

cosmic-snow avatar Oct 10 '23 13:10 cosmic-snow

You mean 2.4.19 not 2.4.9, right?

Yes, sorry, already updated the issue title.

First of all, one thing you can try is rename your settings file, which is located at C:\Users\<name>\AppData\Roaming\nomic.ai\GPT4All.ini. Try giving it a different extension (so you have it backed up). A new one with default values will be created automatically the next time you start GPT4All.

Changed the extension, no success: GPT4All still won't start.

If that doesn't help, you can also try adding a line device=CPU to the General section, or change the line if device= already exists there, e.g.:

[General]
device=CPU
...

Close the program before you do that and restart it afterwards. I didn't need to close the program for obvious reasons. Added the device configuration, still won't start :(

ADD-eNavarro avatar Oct 10 '23 14:10 ADD-eNavarro

I uploaded a debug build of the installer to the releases page, it's called gpt4all-installer-win64-v2.5.0-pre1-debug.exe. If you install that, the output of Event Viewer will at least have some meaning to us. windbg would be even better:

  1. Download the Windows SDK
  2. Install it, clearing all checkboxes except for "Debugging Tools for Windows", which is the only one you would need
  3. Start WinDbg (X64)
  4. File > Open Executable, navigate to C:\Program Files\gpt4all\bin\chat.exe
  5. If it stops at ntdll!LdrpDoDebuggerBreak, press the F5 key to continue
  6. If it stops again, go to View > Call Stack, which will hopefully have useful information about the crash

cebtenzzre avatar Oct 10 '23 14:10 cebtenzzre

Here's the result of following your instructions with Windbg:

image

ADD-eNavarro avatar Oct 18 '23 10:10 ADD-eNavarro

Here's the result of following your instructions with Windbg:

Can you continue past that with F5? I think that's just another bug in Windows breakpoint handling, not an actual issue with the code. You should be able to continue until you get a call stack with lines other than ntdll!... in it.

cebtenzzre avatar Oct 18 '23 14:10 cebtenzzre

Hope this is what you need: image

ADD-eNavarro avatar Oct 18 '23 15:10 ADD-eNavarro

Hope this is what you need:

Yes, that is very helpful, thanks.

edit: Could you please try to get info for the exception by running the .exr -1 command after windbg stops at that point?

cebtenzzre avatar Oct 18 '23 15:10 cebtenzzre

Sure thing, here it goes: image

ADD-eNavarro avatar Oct 19 '23 06:10 ADD-eNavarro

Unfortunately, I'm not sure how to get the exception message with WinDbg. Here's another option:

I uploaded a console-enabled build (gpt4all-installer-win64-v2.5.0-pre2-debug-console.exe ) to the pre-release.

It would be helpful if you could start chat.exe via the command line - install that version, use "Open File Location" on the shortcut to find chat.exe, shift-right-click in the folder and open a powershell or command prompt there, and run .\chat (powershell) or chat (command prompt).

If there is any console output, please post it here.

cebtenzzre avatar Oct 19 '23 14:10 cebtenzzre

Morning!

Got this: image

I?m afraid all three options result in the process stopping without further message: image

ADD-eNavarro avatar Oct 20 '23 05:10 ADD-eNavarro

So, are we out of luck, @cebtenzzre ?

ADD-eNavarro avatar Nov 07 '23 07:11 ADD-eNavarro

Unless you can debug it with Visual Studio (which I know will provide the exception information), I'm not sure what else to do.

cebtenzzre avatar Nov 07 '23 17:11 cebtenzzre

Just a suggestion for debugging this. What about using procdump (from Microsoft) to help capture the stack trace. Something like: procdump -mm -x . chat.exe (assuming procdump v11 and that it's in the current path). The -mm switch is the minidump format, captures the basic process details. You can use something like WinDbg (and other tools) to debug it. Again, just a thought to help capture the instant it crashes.

H4CKS4F3 avatar Nov 07 '23 19:11 H4CKS4F3

@H4CKS4F3 , WinDbg was already used, if you read back a little. I gave a try to procdump, here are the two files, first one with -mm and, since I couldn't see a thing in there, the second one without the minidump parameter. dump.dmp dump2.dmp

ADD-eNavarro avatar Nov 08 '23 11:11 ADD-eNavarro

@ADD-eNavarro run the following and attach the dump. Since procdump defaults to not dump on unhandled exceptions, it lost the actual exception in the minidump. procdump -mm -e -x . chat.exe

H4CKS4F3 avatar Nov 08 '23 23:11 H4CKS4F3

Here's the result of that last procdump run: dump3_231109_073726.dmp

ADD-eNavarro avatar Nov 09 '23 06:11 ADD-eNavarro

Now we're getting somewhere:

KERNELBASE!RaiseException+6c    
VCRUNTIME140!_CxxThrowException+90 [D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75]   D:\a\_work\1\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75 
llmodel+ba4dc    
0x0000002f`b14fd2b8 

Unfortunately, I no longer have a copy of the debug info for that build of GPT4All, so I can't resolve llmodel+ba4dc to anything specific.

Here is a newer build that you can install and run the same procdump command on: gpt4all-installer-win64-v2.5.2.r8.gd4ce9f4-debug-console.exe

I'll keep that build tree in a separate folder so I'll be able to debug it when you reply.

cebtenzzre avatar Nov 09 '23 18:11 cebtenzzre

New dump: dump4_231110_094643.dmp

ADD-eNavarro avatar Nov 10 '23 08:11 ADD-eNavarro

Here is the call stack when the exception is thrown:

KERNELBASE!RaiseException+0x6c
VCRUNTIME140D!_CxxThrowException+0x120
llmodel!vk::detail::throwResultException+0x29c
llmodel!vk::resultCheck+0x23
llmodel!vk::Instance::enumeratePhysicalDevices<std::allocator<vk::PhysicalDevice>,vk::DispatchLoaderDynamic>+0xf7
llmodel!kp::Manager::listDevices+0x38
llmodel!ggml_vk_available_devices+0xf6
llmodel!LLModel::availableGPUDevices+0x4f
chat!MySettings::MySettings+0x74
chat!MyPrivateSettings::MyPrivateSettings+0x14
chat!`anonymous namespace'::Q_QGS_settingsInstance::innerFunction+0x36
chat!QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance>::Holder<`anonymous namespace'::Q_QGS_settingsInstance>+0x1c
chat!QGlobalStatic<QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance> >::instance+0x4c
chat!QGlobalStatic<QtGlobalStatic::Holder<`anonymous namespace'::Q_QGS_settingsInstance> >::operator()+0x24
chat!MySettings::globalInstance+0x12
chat!main+0x12f
chat!invoke_main+0x39
chat!__scrt_common_main_seh+0x12e
chat!__scrt_common_main+0xe
chat!mainCRTStartup+0xe
kernel32!BaseThreadInitThunk+0x10
ntdll!RtlUserThreadStart+0x2b

It's caused by VK_ERROR_DEVICE_LOST: Capture

So it looks like we need to catch Vulkan exceptions from komputeManager()->listDevices() and ignore them. It seems like there is some issue with your GPU driver that prevents Vulkan from being used.

cebtenzzre avatar Nov 10 '23 21:11 cebtenzzre

Anything I can do then?

ADD-eNavarro avatar Nov 13 '23 07:11 ADD-eNavarro

From my perspective, unless you can suggest a patch, looks like you'll need to wait for the developers to do something. One thing I'd suggest is updating drivers, since this seems to be a driver issue. I actually was suffering from this issue too, but "something changed" and it started working again. Maybe I updated drivers, but I can't be certain. I have NVIDIA card, so I may have updated the driver + CUDA.

H4CKS4F3 avatar Nov 16 '23 00:11 H4CKS4F3

Following @H4CKS4F3 advice, we've updated the CUDA to version 12.3.1, which updated NVidia drivers from 545.84 to 546.12. Other changes that came along were: Nsight Compute, 2023.3.1 -> 2023.3.1 Nsight Visual Studio Edition, 2023.3.0.23xxx -> 2023.3.1.23311

But GPT4All still doesn't start. So maybe it's not the drivers.

ADD-eNavarro avatar Dec 11 '23 12:12 ADD-eNavarro

i had exactly this problem ... and solved it .. deactivatoin of Antivirus is NOT enough ... you need to reinstall .. here my text from another post I solved my own problem / issue:

hey .. just an update for Windows users:

The reason ... why on windows chat.exe is opening in the task manager only but not opening the GUI seems to be an interference with AVG Antivirus software. After uninstalling it gpt4all version 2.8 pre opened with CUDA and everything!!

I made a discord post in the Gpt4all channel here: https://discord.com/channels/1076964370942267462/1090651132390543400/1242561840056111114

thus uninstall antivirus software ... run chat.exe again ... then reinstall antivirus .... at least in my case this is the solution

neural-oracle avatar May 21 '24 19:05 neural-oracle

i had exactly this problem

Different issue. OP experienced a crash caused by a bad interaction with a non-functional Vulkan driver.

cebtenzzre avatar May 21 '24 20:05 cebtenzzre