radeon_gpu_profiler
radeon_gpu_profiler copied to clipboard
VK_ERROR_OUT_OF_HOST_MEMORY on windows profiling vulkan
Hi, I have this problem that's unique to Windows 10 x64, where after enabling profiling in Developer Panel, my app cannot start up and fails with VK_ERROR_OUT_OF_HOST_MEMORY. Without profiling, everything runs fine. The error occurs in vkCreateDevice. I think the problem might be with queues I want created. If I request only one graphics queue, vkCreateDevice succeeds, but if I request 1 graphics, 3 compute, 1 transfer, then I get the error.
I am only enabling VK_KHR_swapchain, it doesn't matter if I don't select anything in VkPhysicalDeviceFeatures. I have tested with and without validation layers, my application is requesting vulkan 1.1, although, IIRC I also had this problem with 1.0. I have stopped overlays like RTSS, Steam, renderdoc, but nothing helps.
On Linux, everything runs fine (I just need to point VK_ICD_FILENAMES at AMDVLK, because I use RADV by default), but I have not checked if the driver advertises different set of queue families and capacities, my code is sort of flexible in this regard.
Configuration:
- Windows 10 Enterprise N x64 1803, OS build 17134.48
- RX Vega 64
- Threadripper 1950X
- LunarG SDK 1.1.73
- RGP 1.2.0.21 (on Windows, some other 1.2 release on Linux)
- GPU Driver 18.5.2
RDP log:
[RDP] Received client connected from unknown client with id 6800.
[RDP] Received client halted from unknown client with id 6800.
[RDP] Processing halted client with id 6800: v4.exe:11144 - AMD Vulkan Driver
[RDP] Updated v4.exe ClientId to 6800
[RDP] Connected DriverControlClient to process 'v4.exe', ProcessId = 11144
[RDP] Filtered halted process with ProcessId = 11144
[RDP] Enabled profiling for target executable 'v4.exe', ProcessId = 11144.
[RDP] Set profiling flag for ProcessId = 11144 to true.
[RDP] Capture profile button is enabled because the target application is profilable and there is no profile in progress.
[RDP] Found 0 settings.
[RDP] Resumed execution of process 'v4.exe', ProcessId = 11144. Disconnect client.
[RDP] Wait for driver initialization in process 'v4.exe' failed.
[RDP] Attempted to disconnect from DriverControlClient that was already disconnected.
[RDP] Client with Id 6800 has disconnected.
[RDP] Capture profile button has been disabled because the application is not profilable.
Hi, Thanks for the feedback. Could you provide a very simple test app that duplicates the issue you're seeing?
Also, try updating your driver to the latest 18.8.2 and get the latest RGP 1.3. There have been a number of changes that may fix this issue.
Thanks, Tony.
Hey, I re-tested on 18.8.2 with RGP 1.3 and I get the same issue. As soon as I enable profiling in RDP, my app will crash on vkDeviceCreate. When I disable profiling, it starts working correctly again.
Using the new DebugUtils extension, I get this in my logs: [ Loader Message ] ERROR & GENERAL => terminator_CreateDevice: Failed in ICD C:\WINDOWS\System32\DriverStore\FileRepository\c0332601.inf_amd64_5beeaaa0c940e99c\B332635\.\amdvlk64.dll vkCreateDevicecall
I think this may be related to a similar issue I had in baldurk/renderdoc#1078, when replaying a capture would also crash with the same exact error.
As for reproducing, maybe you could try opening the capture I uploaded in that issue? I just checked and can still reproduce that error. Interestingly, when I reboot my machine and replay that capture without doing anything else, it works on first try. When I close and re-open the same capture, it starts failing with that error.
As for reproducing this on my app, I could send you the source code, but it does not use a common visual studio build and so it might be a hassle for you. If you have a secure sandbox, I could send you the binary, maybe that would be easier?
I don't know if something is wrong with my VulkanSDK/radeon driver installation. I tried reinstalling both, starting RDP with an explicit VK_LAYER_PATH, nothing seems to make a difference here (and it did for that RenderDoc issue).
Hi Farnoy,
After re-reading your initial post and talking to others, we're actively working on a fix for RGP capture on multiple compute queues, which will be fixed in new driver release shortly. If you would still like to share your application, I've added a dropbox share here: https://www.dropbox.com/home/RGP-farnoy. We can test with the internal driver we have and let you know if it fixes the issue you're seeing.
Thanks, Tony.
Hey @ahosier,
I have started using compute queues recently, so this would fit what you're saying. I uploaded Linux and Windows x64 binaries and an asset that it uses. The working app should render a static imgui window and a bunch of helmets, you can press G to disable camera movement.
My app is hardcoded to request all compute queues available, and always submits to 3 of them in the Windows version, 4 in the Linux one. This is how many I have on RX Vega 64 and I didn't bother making this dynamic yet. Let me know if that's a problem for you.
Please keep me updated when a fix arrives!
Thank you, Jakub
I seem to have a similar error on my RX 560 on windows:
validation layer: terminator_CreateDevice: Failed in ICD C:\windows\System32\DriverStore\FileRepository\u0339878.inf_amd64_c30429afa55bc85b\B339766.\amdvlk64.dll vkCreateDevicecall validation layer: vkCreateDevice: Failed to create device chain. RuntimeError: vk::PhysicalDevice::createDevice: ErrorOutOfHostMemory
This only occurs while I'm running the Radeon Developer Panel. If I close the radeon developer panel, I can create use vkCreateDevice. Other tools like RenderDoc also crash when "settings > core > Enable Radeon GPU Profiler Integration" is enabled.
My application only uses one graphics/present queue, and no compute queues.
I seem to be able to start Sascha's examples with the profiler attached, but I'm unable to capture any profiles...
Hi, please make sure you are using the latest AMD driver (19.3.1), and that you do NOT have another GPU in your system. If you are running Intel Integrated graphics, please disable it in the device manager. Currently, RGP only reliably supports the presence of a single GPU (2 AMD GPU's can cause issues too BTW - its not a vendor thing).
@gselley Disabling my integrated graphics seems to prevent GLFW from initializing a window for some unknown reason. Might be a limitation of my e-GPU/laptop development environment.
Why is RPG only reliable in the presence of a single GPU? That makes developing on a laptop very difficult. Seems like something other frame capture tools have been able to account for, although perhaps more detailed profiling makes this a more difficult feat.
Hello, is there update on this?
I'm also running to this problem on both Vulkan and D3D12 since I got RTX2070 in addition to RX480. It would be nice if this was fixed as It's really convenient to keep multiple cards in one computer instead of multiple computers. Also easy to just choose which card to use using the interfaces given by the API's.
There are still Known Issues related to profiling applications on systems with more than one GPU.