kompute icon indicating copy to clipboard operation
kompute copied to clipboard

Destruction of Algorithm that is presently running violates Vulkan spec - (Presently theoretical)

Open 20kdc opened this issue 3 years ago • 2 comments

Algorithm's destroy function destroys:

  • A Pipeline
  • A DescriptorPool
  • A PipelineLayout

without waiting for completion.

There are other objects destroyed which may also cause errors in similar ways.

vkDestroyPipeline: VUID-vkDestroyPipeline-pipeline-00765: All submitted commands that refer to pipeline must have completed execution

vkDestroyDescriptorPool: VUID-vkDestroyDescriptorPool-descriptorPool-00303: All submitted commands that refer to descriptorPool (via any allocated descriptor sets) must have completed execution

There is also the theoretical potential for: vkDestroyPipelineLayout: VUID-vkDestroyPipelineLayout-pipelineLayout-02004: pipelineLayout must not have been passed to any vkCmd* command for any command buffers that are still in the recording state when vkDestroyPipelineLayout is called

20kdc avatar May 03 '21 15:05 20kdc

Ok it seems that this test:

TEST(TestAsyncOperations, TestManagerAsyncExecutionDestroyDescriptors)
{
    {
        uint32_t size = 10;

        std::string shader(R"(
            #version 450

            layout (local_size_x = 1) in;

            layout(set = 0, binding = 0) buffer b { float pb[]; };

            shared uint sharedTotal[1];

            void main() {
                uint index = gl_GlobalInvocationID.x;

                sharedTotal[0] = 0;

                for (int i = 0; i < 100000000; i++)
                {
                    atomicAdd(sharedTotal[0], 1);
                }

                pb[index] = sharedTotal[0];
            }
        )");

        std::vector<uint32_t> spirv = kp::Shader::compileSource(shader);

        std::vector<float> data(size, 0.0);
        std::vector<float> resultAsync(size, 100000000);

        kp::Manager mgr;

        std::shared_ptr<kp::TensorT<float>> tensorA = mgr.tensor(data);
        std::shared_ptr<kp::TensorT<float>> tensorB = mgr.tensor(data);

        std::shared_ptr<kp::Sequence> sq1 = mgr.sequence();
        std::shared_ptr<kp::Sequence> sq2 = mgr.sequence();

        sq1->eval<kp::OpTensorSyncLocal>({ tensorA, tensorB });

        std::shared_ptr<kp::Algorithm> algo1 = mgr.algorithm({ tensorA }, spirv);
        std::shared_ptr<kp::Algorithm> algo2 = mgr.algorithm({ tensorB }, spirv);

        // AMD Drivers in Windows may see an error in this line due to timeout.
        // In order to fix this, it requires a change on Windows registries.
        // More details on this can be found here: https://docs.substance3d.com/spdoc/gpu-drivers-crash-with-long-computations-128745489.html
        // Context on solution discussed in github: https://github.com/EthicalML/vulkan-kompute/issues/196#issuecomment-808866505
        sq1->evalAsync<kp::OpAlgoDispatch>(algo1);
        sq2->evalAsync<kp::OpAlgoDispatch>(algo2);
    }
}

Can recreate the following validation violations:

[2021-05-03 17:14:22.963] [debug] [Sequence.cpp:28] Kompute Sequence Destructor started
[2021-05-03 17:14:22.963] [debug] [Sequence.cpp:208] Kompute Sequence destroy called
[2021-05-03 17:14:22.963] [info] [Sequence.cpp:217] Freeing CommandBuffer
[2021-05-03 17:14:22.964] [debug] [Manager.cpp:25] [VALIDATION]: Validation - Validation Error: [ VUID-vkFreeCommandBuffers-pCommandBuffers-00047 ] Object 0: handle = 0x55f527277820, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x1ab902fc | Attempt to free VkCommandBuffer 0x55f527277820[] which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.2.148.0/linux/1.2-extensions/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)
[2021-05-03 17:14:22.964] [debug] [Sequence.cpp:229] Kompute Sequence Freed CommandBuffer
[2021-05-03 17:14:22.964] [info] [Sequence.cpp:233] Destroying CommandPool
[2021-05-03 17:14:22.964] [debug] [Sequence.cpp:246] Kompute Sequence Destroyed CommandPool
[2021-05-03 17:14:22.964] [info] [Sequence.cpp:250] Kompute Sequence clearing operations buffer
[2021-05-03 17:14:22.964] [debug] [OpAlgoDispatch.cpp:18] Kompute OpAlgoDispatch destructor started
[2021-05-03 17:14:22.964] [debug] [Algorithm.cpp:33] Kompute Algorithm Destructor started
[2021-05-03 17:14:22.964] [debug] [Algorithm.cpp:84] Kompute Algorithm Destroying pipeline
[2021-05-03 17:14:22.964] [debug] [Manager.cpp:25] [VALIDATION]: Validation - Validation Error: [ VUID-vkDestroyPipeline-pipeline-00765 ] Object 0: handle = 0x55f527353328, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bdce5fd | Cannot call vkDestroyPipeline on VkPipeline 0x1a000000001a[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to pipeline must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.148.0/linux/1.2-extensions/vkspec.html#VUID-vkDestroyPipeline-pipeline-00765)
[2021-05-03 17:14:22.964] [debug] [Algorithm.cpp:96] Kompute Algorithm Destroying pipeline cache
[2021-05-03 17:14:22.965] [debug] [Algorithm.cpp:108] Kompute Algorithm Destroying pipeline layout
[2021-05-03 17:14:22.965] [debug] [Algorithm.cpp:120] Kompute Algorithm Destroying shader module
[2021-05-03 17:14:22.965] [debug] [Algorithm.cpp:146] Kompute Algorithm Destroying Descriptor Set Layout
[2021-05-03 17:14:22.965] [debug] [Algorithm.cpp:158] Kompute Algorithm Destroying Descriptor Pool
[2021-05-03 17:14:22.965] [debug] [Manager.cpp:25] [VALIDATION]: Validation - Validation Error: [ VUID-vkDestroyDescriptorPool-descriptorPool-00303 ] Object 0: handle = 0x55f527353328, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x4dad1ae8 | Cannot call vkDestroyDescriptorPool on VkDescriptorPool 0x140000000014[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to descriptorPool (via any allocated descriptor sets) must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.148.0/linux/1.2-extensions/vkspec.html#VUID-vkDestroyDescriptorPool-descriptorPool-00303)
[2021-05-03 17:14:22.965] [debug] [OpBase.hpp:28] Kompute OpBase destructor started
[2021-05-03 17:14:22.965] [debug] [Sequence.cpp:28] Kompute Sequence Destructor started
[2021-05-03 17:14:22.965] [debug] [Sequence.cpp:208] Kompute Sequence destroy called
[2021-05-03 17:14:22.966] [info] [Sequence.cpp:217] Freeing CommandBuffer
[2021-05-03 17:14:22.966] [debug] [Manager.cpp:25] [VALIDATION]: Validation - Validation Error: [ VUID-vkFreeCommandBuffers-pCommandBuffers-00047 ] Object 0: handle = 0x55f527275a30, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x1ab902fc | Attempt to free VkCommandBuffer 0x55f527275a30[] which is in use. The Vulkan spec states: All elements of pCommandBuffers must not be in the pending state (https://vulkan.lunarg.com/doc/view/1.2.148.0/linux/1.2-extensions/vkspec.html#VUID-vkFreeCommandBuffers-pCommandBuffers-00047)
[2021-05-03 17:14:22.966] [debug] [Sequence.cpp:229] Kompute Sequence Freed CommandBuffer
[2021-05-03 17:14:22.966] [info] [Sequence.cpp:233] Destroying CommandPool
[2021-05-03 17:14:22.966] [debug] [Sequence.cpp:246] Kompute Sequence Destroyed CommandPool
[2021-05-03 17:14:22.966] [info] [Sequence.cpp:250] Kompute Sequence clearing operations buffer
[2021-05-03 17:14:22.966] [debug] [OpTensorSyncLocal.cpp:23] Kompute OpTensorSyncLocal destructor started
[2021-05-03 17:14:22.966] [debug] [OpBase.hpp:28] Kompute OpBase destructor started
[2021-05-03 17:14:22.966] [debug] [OpAlgoDispatch.cpp:18] Kompute OpAlgoDispatch destructor started
[2021-05-03 17:14:22.966] [debug] [Algorithm.cpp:33] Kompute Algorithm Destructor started
[2021-05-03 17:14:22.966] [debug] [Algorithm.cpp:84] Kompute Algorithm Destroying pipeline
[2021-05-03 17:14:22.967] [debug] [Manager.cpp:25] [VALIDATION]: Validation - Validation Error: [ VUID-vkDestroyPipeline-pipeline-00765 ] Object 0: handle = 0x55f527353328, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bdce5fd | Cannot call vkDestroyPipeline on VkPipeline 0x130000000013[] that is currently in use by a command buffer. The Vulkan spec states: All submitted commands that refer to pipeline must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.148.0/linux/1.2-extensions/vkspec.html#VUID-vkDestroyPipeline-pipeline-00765)
[2021-05-03 17:14:22.968] [debug] [Algorithm.cpp:96] Kompute Algorithm Destroying pipeline cache
Makefile:90: recipe for target 'mk_run_tests' failed
make: *** [mk_run_tests] Segmentation fault

axsaucedo avatar May 03 '21 16:05 axsaucedo

Yes, VUID-vkDestroyPipelineLayout-pipelineLayout-02004 is only if someone left a sequence still recording

20kdc avatar May 03 '21 16:05 20kdc