GPUImage3 icon indicating copy to clipboard operation
GPUImage3 copied to clipboard

MTLCommandEncoder Type

Open Danny1451 opened this issue 6 years ago • 1 comments

I have some questions about the MTLCommandEncoder type. In the BasicOperation , all the operations like Color processing , Blend use the MTLRenderCommandEncoder to process the image and pass to next ImageConsumer. Why not use the MTLComputeCommandEncoder to do the image process work ? Since not every filter need to render to the view. and MTLComputeCommandEncoder can do data-parallel compute which maybe more effective. Is there any considerations to use MTLRenderCommandEncoder ?

Danny1451 avatar Aug 01 '18 10:08 Danny1451

The two reasons we are using render operations initially over compute are a) ease of porting and b) last time I'd checked, the render pipeline was faster than compute for equivalent operations. The previous iterations of GPUImage used OpenGL (ES) shaders and by necessity were built around a rendering architecture. That makes it easier to translate those operations into another rendering architecture, this time in Metal, instead of reworking them for compute right off the bat.

An old Apple Developer Forum thread talks about performance differences that people observed initially when it comes to compute vs. render operations in Metal. I saw something similar in my initial tests years ago, but I don't know if any of that is still the case today. This hardware was originally built and optimized for rendering, but so much has been done on the compute side over the last few years that none of that may be true anymore.

This is something that we'll clearly be benchmarking as we get things stabilized, along with side-by-side comparisons with Metal Performance Shaders for relevant operations. I'm sure we'll learn quite a bit out of that. If it turns out that compute allows for better performance, we'll rework to target that.

The current simple shaders we have run so quickly on Metal-supporting devices that it's hard to benchmark differences in their performance. You're looking at maybe sub-millisecond timing changes on current hardware, so we'll need to come up with good test cases and run on as old of hardware as we can, on up.

BradLarson avatar Aug 01 '18 14:08 BradLarson