Editor Hangs when changing "content" script (on Linux).
Hi, I am not certain that this is related to linux specifically, but when I load different "content" scripts in the editor sometimes the application hangs and sometimes it won't even respond to signals. (ie I have to kill -9 the process.). I tried it in debug mode and found the problem point.
7238 // Initiate stalling CPU when GPU is not yet finished with next frame:
7239 if (FRAMECOUNT >= BUFFERCOUNT)
7240 {
7241 const uint32_t bufferindex = GetBufferIndex();
7242 for (int queue = 0; queue < QUEUE_COUNT; ++queue)
7243 {
7244 if (frame_fence[bufferindex][queue] == VK_NULL_HANDLE)
7245 continue;
7246
7247 res = vkWaitForFences(device, 1, &frame_fence[bufferindex][queue], VK_TRUE, 0xFFFFFFFFFFFFFFFF);
7248 assert(res == VK_SUCCESS);
7249
7250 res = vkResetFences(device, 1, &frame_fence[bufferindex][queue]);
7251 assert(res == VK_SUCCESS);
7252 }
7253 }
The call to vkWaitForFences hangs. I am new to this api (and modern graphics in general), but I see that the timeout is very large. Is this the right way to handle "CPU stalling"? I think at least this could loop on VK_TIMEOUT and use a reasonably small timeout (from what I have been googling). Also , here is the call stack from when I was able to stop the process:
* thread #1, name = 'WickedEngineEdi', stop reason = signal SIGSTOP
* frame #0: 0x00007ffff791d9ed libc.so.6`__poll + 77
frame #1: 0x00007fffda007cc3 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol36082 + 147
frame #2: 0x00007fffda422f59 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol44349 + 73
frame #3: 0x00007fffda407950 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol44160 + 672
frame #4: 0x00007fffda3239ae libnvidia-glcore.so.550.78`___lldb_unnamed_symbol42754 + 30
frame #5: 0x0000555555c8668b WickedEngineEditor`wi::graphics::GraphicsDevice_Vulkan::SubmitCommandLists(this=0x000055555705a380) at wiGraphicsDevice_Vulkan.cpp:7247:26
frame #6: 0x0000555555babf01 WickedEngineEditor`wi::Application::Run(this=0x00007fffff8d4990) at wiApplication.cpp:252:37
frame #7: 0x00005555555b4661 WickedEngineEditor`sdl_loop(editor=0x00007fffff8d4990) at main_SDL2.cpp:16:19
frame #8: 0x00005555555b4ce0 WickedEngineEditor`main(argc=1, argv=0x00007fffffffe818) at main_SDL2.cpp:162:23
frame #9: 0x00007ffff7841d4a libc.so.6`___lldb_unnamed_symbol3264 + 122
frame #10: 0x00007ffff7841e0c libc.so.6`__libc_start_main + 140
frame #11: 0x00005555555b4285 WickedEngineEditor`_start + 37
I will play with this more next week, but I thought I would wait for some feedback on the intent with the large timeout.
Thanks.
EDIT: It occurred to me that maybe it is stuck in some loop and it just happens to always break while the process is waiting on that line (7247).
Hi, there is the "infinite" timeout for a purpose, it would be invalid to go further while the GPU is not finished with that frame which we are waiting on. Could you make sure that you have updated graphics drivers?
I did a full update and verified I have the latest driver, and I was able to get to freeze again immediately (loading scripts under "Content").
local/nvidia 550.78-7
NVIDIA drivers for linux
@ricejasonf Wicked recently updated the dxcompiler to the May version, and that seems to be broken on Linux (#856) and caused all kinds of weird issues on various graphics drivers. It has been reverted to the previous version, can you update to master and give it another try?
Sorry, but the problem still persists. It does not happen every time, but it still definitely freezes when loading a script.
Did you delete the shaders/spirv directory just to make sure no compiled shaders from the dxcompiler remain?
I deleted the entire build directory. If that is where they are located, then yes. (I am on the Discord if that is easier for back and forth stuff.)
I can confirm that it is in fact getting stuck in that vkWaitForFences call. Consider the following small alteration to the point of interest:
7247 while (true) {
7248 res = vkWaitForFences(device, 1, &frame_fence[bufferindex][queue],
7249 VK_TRUE, uint64_t{10000000000});
7250 if (res == VK_SUCCESS) break;
7251 assert(res == VK_SUCCESS);
7252 }
Attempting to reproduce the error results in hitting the assert after 10 seconds of blank screen.
WickedEngineEditor: /home/jason/Projects/WickedEngine/WickedEngine/wiGraphicsDevice_Vulkan.cpp:7251: virtual void wi::graphics::GraphicsDevice_Vulkan::SubmitCommandLists(): Assertion `res == VK_SUCCESS' failed.
Aborted (core dumped)
It would be nice to find the bug, but I think there is also an opportunity for graceful error handling here.
I realized that this is a duplicate of #804.
Can you confirm that the hang always happens when queue is 3 (QUEUE_VIDEO_DECODE)? And never with any other value?
I tried it several times and the value for queue was consistently 3. So, yes, that looks like the enum value for QUEUE_VIDEO_DECODE as you stated.
When resizing the widget window for the entity component system, I can reproduce this very quickly just wagging it back and forth. Still always queue == 3
~~Duplicate of #804~~
Edit: I decided to mark 804 as a duplicate; even though it's the older one, most information is in this issue.
This should be fixed now.