Vulkan-ValidationLayers icon indicating copy to clipboard operation
Vulkan-ValidationLayers copied to clipboard

Let VVL help debug `VK_ERROR_FEATURE_NOT_PRESENT`?

Open MarijnS95 opened this issue 5 months ago • 13 comments

This request is probably the inverse of #335. In an app that I'm trying to capture in RenderDoc, that I'm not yet entirely familiar with, VK_ERROR_FEATURE_NOT_PRESENT is raised back to the app. I'm unsure which feature of the possibly large pNext chain is the root cause of this.

Since RenderDoc inserts the VVL between the app and its own capture layer (validated with VK_LOADER_DEBUG=layer, nice!), I'm also not sure if it's the app suddenly requesting something unsupported by the device (while it did work outside RenderDoc), the RenderDoc capture layer requesting something unsupported (wouldn't be caught by the VVL if it sits between the capture layer and the app, of course), or RenderDoc returning this error to this app if it turns out to be hiding a certain feature from the app.

#337 seems to remove pre validation to address #335, while I might be interested in (autogenerated?) post "validation" to compare and pretty-print the mismatching feature(s) whenever vkCreateDevice() returned VK_ERROR_FEATURE_NOT_PRESENT. Is that something that is in scope or even feasible for the VVL? Thanks!

MarijnS95 avatar Jul 30 '25 13:07 MarijnS95

VVL could easily detect VK_ERROR_FEATURE_NOT_PRESENT (also VK_ERROR_EXTENSION_NOT_PRESENT) in PostCallRecordCreateDevice

Honestly I do agree this is something with in VVL possibility, I would report it as a warning (since it is not an error)

#335

Will admit 7 years ago probably was easier to pin point the feature before we added 400 new features to the spec 😆

spencer-lunarg avatar Jul 30 '25 21:07 spencer-lunarg

That would be lovely. In the end I managed to bisect the problem (reverse GitHub issue link above), ~though can no longer find the quoted exact wording in the spec that says that (for at least vkGetPhyisicalDeviceFeatues/Properties2() nor even vkCreateDevice()) that each pNext instance must be of a type that is either defined in the supported/specified core version or a supported/specified extension.~

~As such I'm not sure if the driver returns this error because it's technically not supposed to know a feature struct when the extension is not enabled, or because it specifically sees the feature being turned on without the matching extension 😅~

EDIT: https://registry.khronos.org/vulkan/specs/latest/html/vkspec.html#fundamentals-validusage-pNext says:

Any component of the implementation (the loader, any enabled layers, and drivers) must skip over, without processing (other than reading the sType and pNext members) any extending structures in the chain not defined by core versions or extensions supported by that component.

The extension is still supported by the underlying driver, just not turned on in ppEnabledExtensionNames, so it is expected that it returns FEATURE_NOT_PRESENT based on this incompatibility.

MarijnS95 avatar Jul 30 '25 21:07 MarijnS95

@spencer-lunarg funny enough, I have an C++ feature struct code generated reflection header you can reference lol https://github.com/goarrg/vxr/blob/main/libvxr/vk/device/device_features_reflection.hpp

AlexRouSg avatar Jul 31 '25 17:07 AlexRouSg

@MarijnS95 threw something together, not sure if you still have your "broken" code to test it out with... if not no issue, I'm sure people will have it occur to them enough to quickly test this for us in the next SDK lol

spencer-lunarg avatar Aug 01 '25 19:08 spencer-lunarg

Thanks @spencer-lunarg! I haven't looked too closely at your PR, and don't have access to the machine today (it was specific interplay between Linux+Nvidia, using RenderDoc on o3de), but in hindsight I don't know if this is exactly going to fix the particular issue I was facing, and/or if we need to have additional validation for what I was running into. To recap the above issue from o3de:

  1. RenderDoc hides VK_KHR_pipeline_ray_tracing;
  2. o3de still passes VkPhysicalDeviceRayTracingPipelineFeaturesKHR (and maybe also VkPhysicalDeviceRayTracingPipelinePropertiesKHR) down into vkGetPhysicalDeviceFeatures/Properties2();
  3. RenderDoc doesn't cull these structs out from the pNext chain, or reset them after the query (as above: I can no longer find if it was invalid use to pass these structs in pNext or if the driver is supposed/expected to ignore them);
  4. The underlying Nvidia driver still fills them in as "sure, feature supported";
  5. Those queried feature structs, with VkPhysicalDeviceRayTracingPipelineFeaturesKHR::rayTracingPipeline set to VK_TRUE, are passed unconditionally into vkCreateDevice();
  6. Driver returns VK_ERROR_FEATURE_NOT_PRESENT.

In summary, I'm not sure if vkCreateDevice() (and the getter functions above?) should instead/also have validation against "unknown" SType structs with respect to the Vulkan version and supported extensions on the respective physical device?

MarijnS95 avatar Aug 01 '25 19:08 MarijnS95

@MarijnS95 does the extension get reported with vkEnumerateDeviceExtensionProperties? iirc renderdoc should at least not report there

AlexRouSg avatar Aug 01 '25 22:08 AlexRouSg

I learned VK_ERROR_EXTENSION_NOT_PRESENT gets reported by the Loader actually, so not sure vkEnumerateDeviceExtensionProperties matter (@charles-lunarg for correcting me if I am wrong)

spencer-lunarg avatar Aug 01 '25 22:08 spencer-lunarg

I mean if it doesn't get reported in vkEnumerateDeviceExtensionProperties but does get reported in vkGetPhysicalDeviceFeatures2 then validation can use the extension properties to do a first pass validation on the feature chain before going into the individual features

AlexRouSg avatar Aug 01 '25 22:08 AlexRouSg

@AlexRouSg it does not, as said above. This has nothing to do with "being reported by vkGetPhysicalDeviceFeatures2" because the caller specifies what it wants to query.


When talking about properties I didn't mean VkExtensionProperties (name and spec version), but VkPhysicalDeviceRayTracingPipelinePropertiesKHR. This gets filled in when running through renderdoc, just like VkPhysicalDeviceRayTracingPipelineFeaturesKHR.

https://registry.khronos.org/vulkan/specs/latest/html/vkspec.html#fundamentals-validusage-pNext says:

Each structure included in the pNext chain must be defined at runtime by either:

  • ...
  • a supported device extension in the case of physical-device-level functionality added by the device extension

So for these physical device level functions, it should be "invalid" in the context of this renderdoc-modified environment to query that, which is a known issue in o3de. That would be awesome to have as pre-validation before going into these physical-device-level functions.

MarijnS95 avatar Aug 05 '25 10:08 MarijnS95

@MarijnS95 sorry, I was speaking in terms of what validation can do, not what your program is doing.

so if the extension doesn't get reported in vkEnumerateDeviceExtensionProperties then validation can use that to warn about the chain having VkPhysicalDeviceRayTracingPipelinePropertiesKHR

AlexRouSg avatar Aug 05 '25 10:08 AlexRouSg

Yes, that's what I was already requesting / suggesting above.

The proposed PR definitely addresses something that I have wanted to have before, but wouldn't have found this issue that o3de turned out to have because the VVL also unconditionally starts querying the Feature structs without first checking if the extension(s) or core versions allow that struct that to be used: https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/10482/files#r2248777579

That would instead be great to have as pre-validation (because it's not valid usage, at which point the VVL might as well check the overlap between feature fields beforehand even though #335 requested to remove it on the merit that turning on unsupported features is valid because the driver should return FEATURE_NOT_PRESENT in that case (but maybe we should not purely rely on it?).

Keep in mind that such validation should apply to all pdev-level functions including vkCreateDevice() and vkGetPhysicalDeviceProperties/Features2().

MarijnS95 avatar Aug 05 '25 12:08 MarijnS95

@spencer-lunarg I see your comment on #10482 saying you intended to leave this issue opened, did that changed?

AlexRouSg avatar Aug 20 '25 01:08 AlexRouSg

did that changed?

no, Github just auto closed it

spencer-lunarg avatar Aug 20 '25 01:08 spencer-lunarg