PostProcessing
PostProcessing copied to clipboard
V2 - Fix threadgroup size warning on MacOS Metal
Fix for #580
Summary:
- Add runtime utilities property to query if using Metal on MacOS
- Reduced lerp & LUT threadgroup size on MacOS Metal to stay under threadgroup size limit
Not saying I understand the ins and outs of Metal API, but this PR looks like a step in the right direction.
- https://developer.apple.com/documentation/metal/compute_processing/calculating_threadgroup_and_grid_sizes
- https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414927-maxtotalthreadsperthreadgroup
I would love to see this merged or worked on at some point (assuming someone in the Post Processing team owns a MacBook.)
My hope:
- Test if existing threadgroup warning cause post processing effect to fail?
- Provide some helpers to make future support for Metal easier, say, on top of
SHADER_API_METAL, can we getMETAL_MAX_THREAD_IN_GROUPwhich reflects the limit of current device?
Yes. The real solution is some way to query the threadgroup size limit.
Unfortunately things are a little complicated, as maximum threadgroup size varies by both device and the particular kernel, so needs a way to be determined per-kernel at runtime (which may not be possible with a simple define such as METAL_MAX_THREAD_IN_GROUP) maxTotalThreadsPerThreadgroup reference
Regarding whether the warning leads to effects failing, in my limited testing (mabook pro 13" 207 & ipad air 2) it looks like the max threadgroup size value used to issue the warning may differ from the actual number of threads the device is capable of running the given kernel at.
For example, a trivial test kernel I wrote to use 1024 threads (so it always triggers the warning) reported the same max threadgroup size of 256 in the warning on both the macbook & ipad. However, the macbook still happily dispatched the kernel & executed all 1024 threads, where the same kernel failed to execute on the ipad. I'm working on a test project for this & looking to see if this discrepancy has already been reported. If this is the case, that makes kernel development on Metal very much working in the dark as there's no reliable way to determine the actual threadgroup size limits aside from trial and error for all kernels on all target devices, which is untenable.