PostProcessing icon indicating copy to clipboard operation
PostProcessing copied to clipboard

V2 - Fix threadgroup size warning on MacOS Metal

Open MayaViolet opened this issue 7 years ago • 2 comments
trafficstars

Fix for #580

Summary:

  • Add runtime utilities property to query if using Metal on MacOS
  • Reduced lerp & LUT threadgroup size on MacOS Metal to stay under threadgroup size limit

MayaViolet avatar Jun 15 '18 19:06 MayaViolet

Not saying I understand the ins and outs of Metal API, but this PR looks like a step in the right direction.

  • https://developer.apple.com/documentation/metal/compute_processing/calculating_threadgroup_and_grid_sizes
  • https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414927-maxtotalthreadsperthreadgroup

I would love to see this merged or worked on at some point (assuming someone in the Post Processing team owns a MacBook.)

My hope:

  • Test if existing threadgroup warning cause post processing effect to fail?
  • Provide some helpers to make future support for Metal easier, say, on top of SHADER_API_METAL, can we get METAL_MAX_THREAD_IN_GROUP which reflects the limit of current device?

bitinn avatar Jun 24 '18 09:06 bitinn

Yes. The real solution is some way to query the threadgroup size limit.

Unfortunately things are a little complicated, as maximum threadgroup size varies by both device and the particular kernel, so needs a way to be determined per-kernel at runtime (which may not be possible with a simple define such as METAL_MAX_THREAD_IN_GROUP) maxTotalThreadsPerThreadgroup reference

Regarding whether the warning leads to effects failing, in my limited testing (mabook pro 13" 207 & ipad air 2) it looks like the max threadgroup size value used to issue the warning may differ from the actual number of threads the device is capable of running the given kernel at.

For example, a trivial test kernel I wrote to use 1024 threads (so it always triggers the warning) reported the same max threadgroup size of 256 in the warning on both the macbook & ipad. However, the macbook still happily dispatched the kernel & executed all 1024 threads, where the same kernel failed to execute on the ipad. I'm working on a test project for this & looking to see if this discrepancy has already been reported. If this is the case, that makes kernel development on Metal very much working in the dark as there's no reliable way to determine the actual threadgroup size limits aside from trial and error for all kernels on all target devices, which is untenable.

MayaViolet avatar Jun 25 '18 03:06 MayaViolet