Vulkan-Loader icon indicating copy to clipboard operation
Vulkan-Loader copied to clipboard

vkGetInstanceProcAddr returns non-NULL for unsupported functions

Open baldurk opened this issue 1 year ago • 21 comments

Describe the bug The vulkan spec defines what vkGetInstanceProcAddr should return for different cases, but aside from a couple of special functions like global functions the main three cases that return non-NULL are:

  • core dispatchable command
  • enabled instance extension dispatchable command for instance
  • available device extension dispatchable command for instance

And it says "any other case, not covered above" must return NULL explicitly. There are some squirrelly details above especially regarding the recent maintenance5 change to whether core functions higher than the application requested can be returned and things, but I don't see any case where an entirely unsupported extension function would be allowed to return a function pointer.

When testing though, it looks like any extension that the loader is aware of at build time will return a function pointer here via trampoline_get_proc_addr -> extension_instance_gpa, when instead it should return NULL for extensions that are not available. For example on my desktop nvidia system I was able to get a function pointer for vkGetDynamicRenderingTilePropertiesQCOM which is not a core command, enabled instance extension command, or available device extension command.

I believe this is not the case for extensions that the loader is unaware of at build time, in that case it looks like it will only return a function pointer if something down the chain implements it.

I ran into this because it was causing some crashes when mesa goes to query for calibrateable timestamp functions. It queries first for the promoted vkGetPhysicalDeviceCalibrateableTimeDomainsKHR and then only if that is NULL it queries for vkGetPhysicalDeviceCalibrateableTimeDomainsEXT.

Previously due to RenderDoc having a similar bug this results in a crash where it calls through the KHR function, doesn't call through RenderDoc's layer to unwrap the physical device, and breaks after that because the physical device isn't as expected. I fixed RenderDoc's bug to properly return NULL for the KHR function, but that means it just crashes from calling directly to NULL from the trampoline. I've also now worked around this issue by implementing the KHR extension in RenderDoc, but I believe this will still crash if code like that (which is dubiously valid according to the spec) is run directly on a driver that supports the EXT but not the KHR.

Environment (please complete the following information):

  • OS: Windows 10 (though reproduces on linux too)
  • Bitdepth: 64-bit
  • GPU: NVIDIA GeForce RTX 4070
  • Graphics Driver: Nvidia 551.23
  • SDK or header version if building from repo: 1.3.277
  • Enabled layers: Repros with none, crashes with RenderDoc

To Reproduce Steps to reproduce the behavior:

  1. Create a vulkan 1.3 instance with no extensions or only WSI extensions.
  2. Enumerate only an nvidia physical device that does not support VK_QCOM_tile_properties, or substitute for any clearly unsupported extension.
  3. Call vkGetInstanceProcAddr(instance, "vkGetDynamicRenderingTilePropertiesQCOM")
  4. See that instead of NULL returned, we get a function pointer. It's not valid to call but it should not have been returned.

VK_LOADER_DEBUG output vk_loader_debug.txt

Additional context To get the proper Get*ProcAddr behaviour from RenderDoc you'll need to build from latest v1.x branch as I've just pushed the fix to return NULL from GIPA/GDPA for unknown functions. The bug is independent of that though.

baldurk avatar Feb 13 '24 13:02 baldurk