sample_layer icon indicating copy to clipboard operation
sample_layer copied to clipboard

Segmentation fault in SampleLayer_CreateDevice

Open hysw opened this issue 2 years ago • 2 comments

I'm creating this issue since I can't find the source for https://renderdoc.org/vulkan-layer-guide.html and make a pull request.

In layer's CreateDevice function

92  VK_LAYER_EXPORT VkResult VKAPI_CALL SampleLayer_CreateDevice(...)
97  {
      ...
118   PFN_vkCreateDevice createFunc = (PFN_vkCreateDevice)gipa(VK_NULL_HANDLE, "vkCreateDevice");
119
120   VkResult ret = createFunc(physicalDevice, pCreateInfo, pAllocator, pDevice);

Here pfnNextGetInstanceProcAddr is called with VK_NULL_HANDLE and "vkCreateDevice" which the return is undefined according to the Vulkan spec. Most implementations happens to return a valid vkCreateDevice pointer. However renderdoc layer returns nullptr in this case and we got a segfault.

A correct implementation should get the instance from physical device and pass that to pfnNextGetInstanceProcAddr.

$ ENABLE_SAMPLE_LAYER=1 ENABLE_VULKAN_RENDERDOC_CAPTURE=1 vkcube

...

Thread 1 "vkcube" received signal SIGSEGV, Segmentation fault.

#0  ?? ()
#1  SampleLayer_CreateDevice (physicalDevice=0x5555555a67e0, 
    pCreateInfo=0x7fffffffc1c0, pAllocator=0x0, pDevice=0x7fffffffc178) at sample_layer.cpp:120
120       VkResult ret = createFunc(physicalDevice, pCreateInfo, pAllocator, pDevice);
...

(gdb) print createFunc
$1 = (PFN_vkCreateDevice) 0x0

hysw avatar Aug 18 '22 23:08 hysw

I can confirm that the query of vkCreateDevice with an invalid instance is incorrect vulkan behavior, however due to the loader having a bug which made that work, it is now a known "bug" in the loader.

PR that documents this behavior (and re-introduces it cause I tried to fix it last week but got pushback as it breaks a bunch of users systems) https://github.com/KhronosGroup/Vulkan-Loader/pull/1183

charles-lunarg avatar Apr 28 '23 20:04 charles-lunarg

I got hit by this as well, on Android (without RenderDoc).

However the issue is more subtle on Android as (PFN_vkCreateDevice)gipa(VK_NULL_HANDLE, "vkCreateDevice"); returns a non-null function pointer, and everything seems to work fine except the layer behaved incorrectly and many of the intercepted functions didn't get called back by the layer.

EDIT: That wasn't actually the real issue. The real issue was that on Android vkGetDeviceProcAddr() will be called by the loader to resolve the address of itself, and in that case I was forwarding the call to the next layer instead of intercepting it.

gregschlom avatar Aug 11 '23 19:08 gregschlom