igl icon indicating copy to clipboard operation
igl copied to clipboard

【Metal】The DeviceFeatureLimits::BufferAlignment is not always equals 16 byte on Metal backend

Open vinsentli opened this issue 10 months ago • 12 comments

The DeviceFeatureLimits::BufferAlignment is not always equals 16 byte on Metal backend. For example , on iOS simulator is 256.

https://developer.apple.com/documentation/metal/developing_metal_apps_that_run_in_simulator When you set arguments for the render or compute command, align constant buffer offsets to 256 bytes.

And I have not test the value on MacOS. Maybe it is same with iOS simulator. I cannot verify it.

From the https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf it shows 256B on M2. image

vinsentli avatar Apr 30 '24 08:04 vinsentli

@corporateshark has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Apr 30 '24 20:04 facebook-github-bot

@vinsentli are you running into this issue while trying to allocate a uniform buffer or a buffer to hold the texture?

Please give an example of the usage so we can better assess next steps.

syeh1 avatar May 01 '24 17:05 syeh1

@vinsentli Could you please share some test code snippet or a usage example showing how exactly this value is being used?

corporateshark avatar May 01 '24 17:05 corporateshark

@syeh1 @corporateshark Does DeviceFeatureLimits::BufferAlignment mean GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT? I fix the same problem on opengl backend in another pull request: https://github.com/facebook/igl/pull/113

vinsentli avatar May 01 '24 22:05 vinsentli

https://www.khronos.org/opengl/wiki/Uniform_Buffer_Object#Data_storage If you bind a uniform buffer with glBindBufferRange, the offset field of that parameter must be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT (this is a global value, not a per-program or per-block one). Thus, if you want to put the data for multiple uniform blocks in a single buffer object, you must make sure that the data for each within that block matches this alignment.

vinsentli avatar May 01 '24 22:05 vinsentli

This is my use scene: put the data for multiple uniform blocks in a single buffer object.

In that scene,must make sure that the data for each within that block matches this alignment.

vinsentli avatar May 01 '24 22:05 vinsentli

Ok, got it. In that case, iOS is 256b, but M2 would be 32b (looking at the Minimum Constant Buffer offset Alignment row)

syeh1 avatar May 01 '24 23:05 syeh1

Ok, got it. In that case, iOS is 256b, but M2 would be 32b (looking at the Minimum Constant Buffer offset Alignment row)

On m2 is 32B,means 128byte.@syeh1

vinsentli avatar May 01 '24 23:05 vinsentli

I update the value on macos. So the alignment on iOS simulator is 256byte, on macos is 128 byte, on iOS phones is 16 byte.

vinsentli avatar May 01 '24 23:05 vinsentli

@corporateshark has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar May 02 '24 17:05 facebook-github-bot

I'm trying to review this, but things are very unclear. Let's start by establishing some things:

  • If we want to keep the OpenGL and Metal PRs separate, let's not mix them in the conversation. It just adds noise.
  • The Metal tables document uses uppercase B for bytes, not bits. IGL's values are also in bytes. There should be no conversion between the two.
  • GPU families got weird after M1 macs. Starting with Apple7, mobile and mac GPUs are in the same family. The limits in the mac2 column represent Intel GPUs, but note that arm macs will also report support for the mac2 family. All of this is in the document linked above.
    • Note that IGL does not distinguish between arm and Intel macs, it just looks at the platform.

Now, onto the speculative:

  1. If I understand the motivation, the type of alignment that matters here is for "constant buffers".
  2. DeviceFeatureLimits::BufferAlignment doesn't necessarily map to constant buffers only, but buffers in general. We might have to add a new enum for it, but we can also just go with the more restrictive of all use cases -- and I say let's go that way until someone comes up with actual code exposing some issue/limitation, as otherwise it's just guessing.
  3. According to the Metal table, iOS physical device alignment should be 16, and mac should be 32. And according to the simulator guide, alignment should be 256. I think these are the more correct numbers to use.
  4. HOWEVER, if we also want to account for the "Buffer alignment for copying an existing texture to a buffer" row given the lack of granularity in IGL's enums, we should use 256 as the alignment for mac as well.

IMO, given the lack of real examples, we should go with this simpler/safer alternative:

#if IGL_PLATFORM_MACOS || IGL_PLATFORM_IOS_SIMULATOR
    result = 256;
#else
    result = 16;
#endif

Thoughts?

tgoulart avatar May 02 '24 19:05 tgoulart

There is a easy way to verify the alignment value on MacOS。 image

In the ColorSession demo, modify the bufferOffset value from 0 to 1, the Xcode will report an error.

I verify the value on MacOS with arm GPU(Apple M1 Pro), the alignment value is 16 bytes. 【the offset into the buffer color that is bound at buffer index 0 must be a multiple of 16】

image

It aligns with the instructions provided in the documentation.

image image

I verify the value on MacOS with Intel GPU(Apple M1 Pro), the alignment value is 32 bytes. 【the offset into the buffer color that is bound at buffer index 0 must be a multiple of 32】 image

Indeed, it has been confusing me.

Yes , use 256 bytes on MacOS more safer. @tgoulart @corporateshark

#if IGL_PLATFORM_MACOS || IGL_PLATFORM_IOS_SIMULATOR result = 256; #else result = 16; #endif

vinsentli avatar May 05 '24 05:05 vinsentli

@corporateshark merged this pull request in facebook/igl@0df3ce0d78eeb0d362a704a707dd7d02421fd525.

facebook-github-bot avatar Jun 14 '24 22:06 facebook-github-bot