gpuweb Resource copying/clearing/updating investigations

Native APIs provide different constraints and features when it comes to resource copies and clears, where resources can be buffers or images. In this issue, we'll try to find a common ground (a least common denominator API) that is usable and efficient on all backends.

In Metal, all of the copy/clear operations are done via the MTLBlitCommandEncoder. In Vulkan, these are transfer operations, supported on any queue type. They require TRANSFER_SRC flag on the source and TRANSFER_DST flag on the destination.

Operation table

operation/backend	Vulkan	D3D12	Metal
clear buffer	vkCmdFillBuffer	views only with ClearUnorderedAccessView*	nothing
clear image	vkCmdClearColorImage, vkCmdClearDepthStencilImage	views only with ClearRenderTargetView, ClearDepthStencilView	nothing
update buffer	vkCmdUpdateBuffer, limited to 64k	nothing	nothing
update image	nothing	nothing	nothing
buffer -> buffer	vkCmdCopyBuffer	CopyBufferRegion	copy
buffer -> image	vkCmdCopyBufferToImage	CopyTextureRegion	copy
image -> buffer	vkCmdCopyImageToBuffer	CopyTextureRegion	copy
image -> image	vkCmdCopyImage	CopyTextureRegion	copy
image blit	vkCmdBlitImage	nothing	generateMipmaps

Buffer Updates

In D3D12, the only way to update a buffer with new data coming from CPU is to use a staging buffer (that is mapped, filled, then copied to the destination).

In Metal, similar effect can be achieved by creating a buffer with makeBuffer that re-uses the existing storage.

In Vulkan, the implementation may have a fast-path for small buffer updates by in-lining the data right into the command buffer space. The implementation can fall back to a staging-like scheme for larger updates.

Image Blitting

Image blits are different from image copies for allowing format conversion and arbitrary scaling with filtering. A typical use case for blitting is mipmap generation. It is not clear to me why/how Vulkan provides this on a transfer-only queue, but other APIs are far more (and reasonably) limited with regards to where and how they can blit surfaces.

Alignment rules

Vulkan

VkPhysicalDeviceLimits has optimal alignments for buffer data when transferring to/from image:

optimalBufferCopyOffsetAlignment is the optimal buffer offset alignment in bytes for vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer
optimalBufferCopyRowPitchAlignment is the optimal buffer row pitch alignment in bytes for vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer

These are not enforced by the validation layers but are recommended for optimal performance.

D3D12

MSDN section lists the following restrictions:

linear subresource copying must be aligned to D3D12_TEXTURE_DATA_PLACEMENT_ALIGNMENT (512) bytes
row pitch aligned to D3D12_TEXTURE_DATA_PITCH_ALIGNMENT (256) bytes

Proposed API

Clears

D3D12 model appears to be the least common denominator. If we have the concept of views, we can have API calls to clear them. In Vulkan, these calls would trivially translate into direct clears. In Metal, we'd need to run a compute shader to clear the resources. Supporting multiple cear rectangles seems to complicate this scheme quite a bit, so I suggest only doing the full-slice clears.

Updates

Given the limited support of resource updates, I suggest not providing this API at all in favor of requiring the user to use staging resources manually.

Copies

All 3 APIs appear to provide the copy capability between buffers and textures. The difference is mostly about the alignment requirements. I suggest having device flags to the minimum offset/pitch required:

D3D12: equal to D3D12 constants
Vulkan: equal to optimal alignment features
Metal: some reasonable default selected by Apple

Blits

D3D12 doesn't support any sort of blitting, I'm inclined to propose no workarounds here. Users doing simple render passes for blitting textures shouldn't be slower than emulating this in the API, anyway.

Afterword

This analysis may be incomplete, corrections are welcome to go directly as the issue edits.

Aug 09 '17 16:08 kvark

In Vulkan, these are transfer operations, supported on any queue type

To clarify, clear commands are not supported on transfer queues even though they count as transfer operations. vkCmdClearColorImage requires graphics or compute queues, vkCmdClearDepthStencilImage requires graphics support.

Aug 09 '17 16:08 msiglreith

Note: Metal's texture-to-texture copies don't specify the destination size, and nowhere it is said to be required to match the source size. I assume Metal scales the result to fit the whole destination slice, but it would be great to get clarification from Apple.

I don't believe there is any scaling. It's just a copy of that rectangle into the destination, at a specified origin.

Aug 09 '17 20:08 grorg

@grorg thanks for clarification! I removed the note from the body now. It seems a little strange to me that Metal doesn't provide scaling for blits yet has a generateMipmaps routine.

Aug 09 '17 20:08 kvark

Thanks for the nice analysis! I'd like to point out that in NXT we have found a way to abstract the D3D12_TEXTURE_DATA_PLACEMENT_ALIGNMENT requirement by splitting copies in two parts if needed. The code handling this can be found here. It is covered by extensive test so we are confident it works (but is only implemented for 2D texture though).

With respect to the proposed API:

Having no way to do "updates" would work but we have found it extremely useful to have an immediate nxtBufferSetSubData for tests (not even a queued operation). If we go with no updates someone will need to make a "blessed" helper library to do good enough buffer updates (oustide of compute / render passes of course).
No blits sounds ok.
Copies: this ties into another topic; ideally there would be default constraints that are validated and work on all platforms. Additionally we could provide a way for an application to discover smaller constraints and explicitly require it at device creation time. From our experiments we think only the rowPitch constraint will need to be present (and maybe image height for 2D arrays / 3D textures).
Clears: if the clears are done outside of compute / render passes, then they could be emulated with empty MTLRenderCommandEncoder````. I'd be interested in knowing why Metal didn't find it necessary to allow clears in MTLBlitCommandEncoder```.

It seems a little strange to me that Metal doesn't provide scaling for blits yet has a generateMipmaps routine.

It sounds like it could be a built-in compute shader.

Aug 09 '17 20:08 Kangz

For any kind of resource (texture or buffer) easy clearing would be highly desirable for compute shader usecases which don't have any way of doing clear via new RenderPasses. I hit this in my wgpu(-rs) based fluid sim quite a bit where I have accumulating or temp targets (either buffer or volume textures) that need periodical clearing. Another more common usecase would be e.g. a histogram that is computed per frame using a compute shader - every frame the buffer for the histogram needs clearing.

Today, all of these cases required specialized clear passes and in many cases (even more to the annoyance of any users) bindgroups, layouts etc. I think ideally webgpu would land at a clear function on the encoder for both textures and buffers, comparable to the copy texture/buffer methods it already provides

Mar 17 '21 15:03 Wumpf

Today, all of these cases required specialized clear passes and in many cases (even more to the annoyance of any users) bindgroups, layouts etc.

Can you explain how it requires bindgroups layouts, etc? beingRenderPass for clearing doesn't need them. Or do you want to clear a texture inline in a compute pass? If that's the case then you can do it with your own dispatches and encapsulate them in a function. I agree that the implementation could do it for you, but that seems like something we can add later (so we can ship the first version of WebGPU faster).

Mar 17 '21 15:03 Kangz

Yes, I was referring to the usecase of having a dedicated compute pass. For everything where a beginRenderPass works, a separate clear function isn't truly needed anyways.

It's ofc true that a user can encapsulate something like this in a function but this way of clearing requires a lot of book keeping that the clear functions of dx12/vulkan don't need: For every type (image format, image dimension etc.) there i a special compute pipeline needed. Every single resource that needs cleaning also needs a bind group just containing this one resource.

Mar 17 '21 15:03 Wumpf

@Wumpf would you be willing to write down a more concrete proposal (in a separate issue), with the following:

description of the use cases. I.e. clearing buffer and texture data before doing compute passes with them, where RENDER_ATTACHMENT is not even needed otherwise.
the implementation paths and costs on each platform. Comparison to what the user could do on their own.
suggested API

Sep 17 '21 14:09 kvark

metal has BlitCommandEncoder::fillBuffer function

Sep 13 '22 03:09 hamwj1991

gpuweb gpuweb copied to clipboard

Resource copying/clearing/updating investigations

Operation table

Buffer Updates

Image Blitting

Alignment rules

Vulkan

D3D12

Proposed API

Clears

Updates

Copies

Blits

Afterword

gpuweb
gpuweb copied to clipboard