Recycling ArrayBuffer in createBufferMapped
I've been working to port our WebGL game engine (www.construct.net) to WebGPU and got as far as running some performance benchmarks. I found one which is ~10% slower with WebGPU than WebGL, and a notable difference in the profile is createBufferMapped() taking ~10% of CPU time. Where WebGL does gl.bufferSubData(), in WebGPU we use the pattern of: createBufferMapped, copy data, unmap, copyBufferToBuffer(), then destroy that buffer after the next submit().
I can see a lot of buffer upload discussion has already happened in #138, so I apologise if this ground has been covered. But it seems that createBufferMapped() adds CPU overhead in the JS environment relative to WebGL's gl.bufferSubData() in that it must both create and zero the contents of a new ArrayBuffer, and also later GC that.
Would it be possible to add a way to re-use an existing ArrayBuffer to eliminate this overhead? Instead of createBufferMapped() creating a new ArrayBuffer, perhaps you could pass one created in JS. Or more strictly, have a way to map the buffer for writing again, but with exactly the same ArrayBuffer you got from the original call to createBufferMapped(). The aim would essentially be to achieve the same buffer upload process, but without the WebGPU API calls having to create their own ArrayBuffer.
This is an area in flux, so expect changes to the API fairly soon. We don't want to get into suggestions here that we've covered elsewhere, but thanks for the report!
OK, I'd be interested to check out any further discussions/changes if you can point me in the right direction!
Dump of related issues (highly varied in how related they are): #418 #501 #506 #509 #511 #512 #516 #536 #555 #594 #605 #649 #650 #680
Today we agreed upon #605 with small tweaks; later we'll be discussing whether to add #509.
So yes, lots of past discussion 🙈 I'm happy to adjust our engine to any new proposal and test the real-world performance relative to WebGL.
Seems that #605 is kind-of what I wanted, but what I originally proposed was something akin to #605 except that you specify the ranges up front, outside of the hot loop and reduce the book keeping inside the hot loop.