clvk icon indicating copy to clipboard operation
clvk copied to clipboard

flakiness with `clEnqueueCopyImage`

Open rjodinchr opened this issue 1 year ago • 3 comments

I am experiencing flakiness on different platforms (ARMs & few Intels) running some CTS copy_images tests.

It's always because of incorrect data in the output buffer.

I am wondering if we are missing something in cvk_command_image_image_copy::build_batchable_inner:

cl_int cvk_command_image_image_copy::build_batchable_inner(
    cvk_command_buffer& cmdbuf) {

    VkImageSubresourceLayers srcSubresource =
        prepare_subresource(m_src_image, m_src_origin, m_region);

    VkOffset3D srcOffset = prepare_offset(m_src_image, m_src_origin);

    VkImageSubresourceLayers dstSubresource =
        prepare_subresource(m_dst_image, m_dst_origin, m_region);

    VkOffset3D dstOffset = prepare_offset(m_dst_image, m_dst_origin);

    VkExtent3D extent = prepare_extent(m_src_image, m_region);

    VkImageCopy region = {srcSubresource, srcOffset, dstSubresource, dstOffset,
                          extent};

    vkCmdCopyImage(cmdbuf, m_src_image->vulkan_image(), VK_IMAGE_LAYOUT_GENERAL,
                   m_dst_image->vulkan_image(), VK_IMAGE_LAYOUT_GENERAL, 1,
                   &region);

    return CL_SUCCESS;
}

In other copies (image to buffer, buffer to image, image init) I see some vkCmdPipelineBarrier. Should we have one here as well?

rjodinchr avatar Jul 31 '24 12:07 rjodinchr

Adding the following barrier after the copy did not fix the issue:

    VkMemoryBarrier memoryBarrier = {
        VK_STRUCTURE_TYPE_MEMORY_BARRIER, nullptr, VK_ACCESS_TRANSFER_WRITE_BIT,
        VK_ACCESS_MEMORY_WRITE_BIT | VK_ACCESS_MEMORY_READ_BIT};
    vkCmdPipelineBarrier(cmdbuf, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
                         VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, 0, 1,
                         &memoryBarrier, 0, nullptr, 0, nullptr);

rjodinchr avatar Aug 13 '24 08:08 rjodinchr

Right, the plot thickens. How reproducible is it?

kpet avatar Aug 13 '24 13:08 kpet

It fails often enough to be easily reproducible. Depending on the test and the GPU, it varies from 10% to more than 50% of the time.

rjodinchr avatar Aug 13 '24 13:08 rjodinchr