vulkano
vulkano copied to clipboard
Readback synchonization
Template
- Version of vulkano: 0.26.0
Issue
Host readback of data written by GPU requires a barrier as follows:
srcAccessMask = VK_ACCESS_MEMORY_WRITE_BIT
dstAccessMask = VK_ACCESS_HOST_READ_BIT
Vulkano does not do that automatically nor offers a way to manually inject such a barrier in AutoCommandBufferBuilder. This is racy and should be avoided.
Is this specifically when using CpuAccessibleBuffer or also in other cases?
Specifically for CpuAccessibleBuffer, yes.
Actually, I was wrong. This is not specific to CpuAccessibleBuffer.
Even when using a pool of buffers for trasnfer and copy_buffer operations, it's still necessary to issue a TRANSFER_WRITE -> HOST_READ barrier. The TRANSFER_READ -> TRANSFER_WRITE barrier issued in the current implementation is not enough.
Does CpuBufferPool allow the host (CPU) to read back from the buffer? I don't see any capability for that in its API. Under what circumstances are these barriers needed for host-to-device transfers? Is there a specific part of the spec where this is detailed?
Does
CpuBufferPoolallow the host (CPU) to read back from the buffer?
My bad, it doesn't. It does have a download function that seems to form a part of (incomplete) API to read back from GPU.
Under what circumstances are these barriers needed for host-to-device transfers? Is there a specific part of the spec where this is detailed?
Host-to-device? Almost never necessary due to an implicit guarantee.
Device-to-host? Always, because there's no implicit guarantee. On the hardware side, you need to ensure that whatever does the writes (shader/DMA) has flushed L2, for example.
Revisiting this issue, I'm wondering what a practical solution to this would be. When recording a command buffer, the code in SyncCommandBufferBuilder doesn't know that the user intends to perform HOST_READ access after the command buffer is submitted, so it will not include that bit in its pipeline barriers. This same issue also crops up when submitting multiple command buffers in sequence without semaphores: the individual command buffers don't know about each other, so they don't know that they should put barriers to separate each other's accesses.
I might be missing something, but could a simple solution be to issue a HOST_READ barrier for buffers that are copied from GPU to CPU, e.g. here: https://github.com/vulkano-rs/vulkano/blob/master/vulkano/src/command_buffer/commands/copy.rs#L2645
Is there a way to distinguish CpuAccessibleBuffer and DeviceLocalBuffer via BufferAccess (maybe via buffer_access.inner().buffer.memory())?
There isn't currently a way to distinguish them. But is it actually correct to have such a barrier if the resource isn't going to be accessed by the CPU? What if the GPU is the next thing to access it instead?
Well, redundant barriers are always harmless (unless they do image layout transition which is irrelevant here).
There isn't currently a way to distinguish them. But is it actually correct to have such a barrier if the resource isn't going to be accessed by the CPU? What if the GPU is the next thing to access it instead?
When copying a buffer from GPU to CPU accessible memory, is the intend not to access it from the CPU and would therefore be a sane default considering all other operations are safe. If no access is desired this could also be distinguished if required by introducing a CPU resident but inaccessible buffer type, which would make the intent of the user clear.
I suppose it works well as a default. @marc0246 is currently working on a rewrite of the buffer types, so I will wait for that to finish before I implement anything. Or he sees this and does it himself. :)