Vulkan-ValidationLayers icon indicating copy to clipboard operation
Vulkan-ValidationLayers copied to clipboard

Validation layer crash due to mishandle of vkDestroySwapchainKHR

Open Mr-222 opened this issue 5 months ago • 5 comments

Environment:

  • OS: Windows11
  • GPU and driver version: NVIDIA RTX3500 Ada Generation Laptop GPU / NVIDIA RTX Driver Release 560
  • SDK or header version if building from repo: 1.3.290
  • Options enabled (synchronization, best practices, etc.): N/A

Describe the Issue

This issue heavily depends on driver implementation, I only observe this issue in NVIDIA GPU.

When VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT is enabled, which means I manually disable the handle wrapping. At this time, when recreating a swapchain by calling vkCreateSwapchainKHR with oldswapchain parameter, the validation layer will ignore situations like VKImage from different swapchains uses same handles. So we may get same image handles used by old swapchains when calling vkGetSwapchainImagesKHR, which the validation layer is not properly handled. Here's a simple example:

  1. Create swapchain1
  2. Create swapchain2
  3. Destroy swapchain1

When destroy the swapchain1, the validation layer would also delete the swapchain's images in the StateMap and ObjectTable without checking if newly created swapchain is using those image handles. However, swapchain2 is still using those images. As a result, the program would crash afterwards because it can't find corresponding image handle when creating image view and framebuffer for the new swapchain.

A simple reproduce

You can use Vulkan sample to reproduce this bug crash easily, just add the VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT extension. What I did is change the Instance.cpp in Vulkan-Samples/framework/core, change the constructor of Instance, add the extension:

Instance::Instance(const std::string                            &application_name,
                   const std::unordered_map<const char *, bool> &required_extensions,
                   const std::vector<const char *>              &required_validation_layers,
                   const std::vector<VkLayerSettingEXT>         &required_layer_settings,
                   bool                                          headless,
                   uint32_t                                      api_version)
{
  ...

	// Some of the specialized layers need to be enabled explicitly
#ifdef USE_VALIDATION_LAYER_FEATURES
	VkValidationFeaturesEXT                   validation_features_info = {VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT};
	std::vector<VkValidationFeatureEnableEXT> enable_features{};
	if (validation_features)
	{
#	if defined(VKB_VALIDATION_LAYERS_GPU_ASSISTED)
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_RESERVE_BINDING_SLOT_EXT);
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT);
#	endif
#	if defined(VKB_VALIDATION_LAYERS_BEST_PRACTICES)
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_BEST_PRACTICES_EXT);
#	endif
#	if defined(VKB_VALIDATION_LAYERS_SYNCHRONIZATION)
		enable_features.push_back(VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT);
#	endif

		validation_features_info.enabledValidationFeatureCount = static_cast<uint32_t>(enable_features.size());
		validation_features_info.pEnabledValidationFeatures    = enable_features.data();
		validation_features_info.pNext                         = instance_info.pNext;
		instance_info.pNext                                    = &validation_features_info;
	}
#endif

	// Enable VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT
	VkValidationFeaturesEXT validation_features_info = {VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT};
	std::vector<VkValidationFeatureDisableEXT> disable_features{};
	disable_features.push_back(VK_VALIDATION_FEATURE_DISABLE_UNIQUE_HANDLES_EXT);
	validation_features_info.disabledValidationFeatureCount = static_cast<uint32_t>(disable_features.size());
	validation_features_info.pDisabledValidationFeatures    = disable_features.data();
	instance_info.pNext                                     = &validation_features_info;

	VkLayerSettingsCreateInfoEXT layerSettingsCreateInfo{VK_STRUCTURE_TYPE_LAYER_SETTINGS_CREATE_INFO_EXT};

	// If layer settings extension enabled by sample, then activate layer settings during instance creation
	if (std::find(enabled_extensions.begin(), enabled_extensions.end(), VK_EXT_LAYER_SETTINGS_EXTENSION_NAME) != enabled_extensions.end())
	{
		layerSettingsCreateInfo.settingCount = static_cast<uint32_t>(required_layer_settings.size());
		layerSettingsCreateInfo.pSettings    = required_layer_settings.data();
		layerSettingsCreateInfo.pNext        = instance_info.pNext;
		instance_info.pNext                  = &layerSettingsCreateInfo;
	}

	// Create the Vulkan instance
	VkResult result = vkCreateInstance(&instance_info, nullptr, &handle);

  ...
}

Also add the instance extension in ray_tracing_basic.cpp:

    RaytracingBasic::RaytracingBasic()
    {
    	...
    
    	add_instance_extension(VK_EXT_VALIDATION_FEATURES_EXTENSION_NAME, true);
    }

After that, run ray_tracing_basic sample, it may crash. It may also crashes when you resize the window. Note that there's a probability to this thing, because the GPU drive doesn't always choose to reuse image handles when recreating a swapchain. According to my experiments, this bug happens four times out of ten, and it only happens when using NVIDIA GPU. So if you don't see this bug crash at the first time, try resize the window multiple times or relaunch the sample several times.

Possible cause

There are 3 places in the code that contributes this bug, the validation layer doesn't check if the image handle is being used by newly created swapchain.

First is in image_state.cpp, line 605, this would cause validation layer can't create its image view object.

void Swapchain::Destroy() {
    for (auto &swapchain_image : images) {
        ...
        dev_data.Destroy<vvl::Image>(swapchain_image.image_state->VkHandle());
        // NOTE: We don't have access to dev_data.fake_memory.Free() here, but it is currently a no-op
    }
    ...
}

The second one lies in object_tracker_utils, line 820:

void ObjectLifetimes::PreCallRecordDestroySwapchainKHR(VkDevice device, VkSwapchainKHR swapchain,
                                                       const VkAllocationCallbacks *pAllocator, const RecordObject &record_obj) {
    ...

    auto snapshot = swapchain_image_map.snapshot(
        [swapchain](const std::shared_ptr<ObjTrackState> &pNode) { return pNode->parent_object == HandleToUint64(swapchain); });
    for (const auto &itr : snapshot) {
        swapchain_image_map.erase(itr.first);
    }
}

The last one is in thread_safety_validation.cpp, line 476:

void ThreadSafety::PostCallRecordDestroySwapchainKHR(VkDevice device, VkSwapchainKHR swapchain,
                                                     const VkAllocationCallbacks* pAllocator, const RecordObject& record_obj) {
    ...
    // Host access to swapchain must be externally synchronized
    auto lock = WriteLockGuard(thread_safety_lock);
    for (auto& image_handle : swapchain_wrapped_image_handle_map[swapchain]) {
        FinishWriteObject(image_handle, record_obj.location);
        DestroyObject(image_handle);
    }
    ...
}

Mr-222 avatar Sep 24 '24 11:09 Mr-222