Potential Vulkan renderPass optimization within CommandList::writeBuffer()
After working on RBDoom3BFG with nvrhi for a while and optimizing the use of Vulkan push constants vs. constant buffers to improve performance (especially on macOS), I realize there may be a simpler optimization involving renderPass completion when calling CommandList::writeBuffer().
Ending a renderPass is expensive on macOS due to the need for Metal renderEncoders. The fewer renderPasses and renderEncoders the better, thus push constants are preferred. However, when looking at the code below, it strikes me that ending a renderPass may not be truly required for the case of volatile (constant) buffers. I am wondering whether endRenderPass() could be moved after handling the special case of volatile buffers like shown below.
Empirically this seems to work, but I am not sure of any other implications. Looking for feedback.
void CommandList::writeBuffer(IBuffer* _buffer, const void *data, size_t dataSize, uint64_t destOffsetBytes)
{
Buffer* buffer = checked_cast<Buffer*>(_buffer);
assert(dataSize <= buffer->desc.byteSize);
assert(m_CurrentCmdBuf);
old
—--> endRenderPass(); // is this truly required for volatile (likely constant) buffers?
m_CurrentCmdBuf->referencedResources.push_back(buffer);
if (buffer->desc.isVolatile)
{
assert(destOffsetBytes == 0);
writeVolatileBuffer(buffer, data, dataSize);
return;
}
new
—--> endRenderPass(); // could the renderPass be ended here instead, i.e. for non-volatile buffers only?
const size_t vkCmdUpdateBufferLimit = 65536;
...
Not an expert on this subject, I did on my side a much less subtle change as I removed the endRenderPass() for all cases and never had any problems. I only use the Vulkan backend and I probably don't have tested every cases.
@GaelCathelin thanks for your info. FYI - the reason I did not remove it entirely is based on the Vulkan spec: see https://docs.vulkan.org/refpages/latest/refpages/source/vkCmdUpdateBuffer.html. The key phrase is: "vkCmdUpdateBuffer is only allowed outside of a render pass", i.e. for non-volatile buffers the renderPass should be ended.
In contrast, volatile buffers are based on mapped host memory and a separate memory slot is used for each volatile instance. I believe this host memory is flushed to the GPU on queue submit and the occupied slots will not change until after the command buffer is finished executing and the associated slots can be released and reused for the next command buffer. That's why I am suggesting that ending the renderPass may not be necessary in the volatile buffer case.
I am curious what @apanteleev thinks about this.
I think moving the endRenderPass() call to after the volatile buffer condition makes perfect sense. The writeVolatileBuffer command is not a real Vulkan transfer operation, unlike vkCmdUpdateBuffer, and is allowed inside a render pass. I just gave it a quick try and didn't observe any issues.
I think moving the endRenderPass() call to after the volatile buffer condition makes perfect sense.
Thanks @apanteleev for reviewing and testing so quickly. I have submitted a small PR to address this.