Metal Heaps default ON causes rendering corruption in RBDoom3-BFG
Compiling from MoltenVK master (and all deps up-to-date), with Metal Heaps set to default ON I get graphics corruption:
When I set MVK_CONFIG_USE_MTLHEAP = 0 I get correct behaviour:
This is tested on an x86-based iMac + AMD 6600XT running the latest Ventura release (13.7.4). There definitely appears to be something wrong here, but I don't know how to debug this one.
I will try this again on my M1 Air and report back when I have results.
UPDATE: I was kind of expecting this, but it works fine with default setting ON on my MacBook M1 Air (also running Ventura). So, it looks like another x86 + AMD difference vs. M* GPUs. Does this help in any way for tracking it down?
@cdavis5e I have looked deeper into this problem and I think I see what is going on:
- The RBDoom3_BFG game and the Vulkan-Samples reference project (select samples) both show evidence of this problem on x86 + AMD machines.
- Both projects use AMD's VMA allocator for allocating Vulkan memory (a standard utility which is heavily used).
- Before allocating any memory, VMA queries the structure
VkMemoryDedicatedRequirementsusing the pNext chain ofvkGetImageMemoryRequirements2(). This returns the structure with elementsprefersDedicatedAllocationandrequiresDedicatedAllocation. MoltenVK currently return both of these asfalse. - However, if either of these were to be true, VMA would append the structure
VkMemoryDedicatedAllocateInfo(with image handle) to the pNext chain ofVkMemoryAllocateInfowhen callingvkAllocateMemory(). - If I force either
prefersDedicatedAllocationorrequiresDedicatedAllocationto be true, or set the VMA flagVMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT, the graphics corruption problem cited here goes away.
Note this problem may be showing up due to AMD's segmented memory map/heap for their GPUs on macOS. This map is different than Apple Silicon GPUs with unified memory. For my AMD 6600 GPU there are three types of memory available:
- Device local memory only (heap 0).
- Shared memory that is host visible, host coherent, and host cached (heap 1).
- Device local memory that is host visible and host cached (heap 0).
When uploading textures, memory type 1 is typically chosen by VMA and other allocators. This is where the failure occurs if VkMemoryDedicatedAllocateInfo is not set during image creation.
Note I have found a second way to workaround the issue, and that is to force textures to use memory type 3 above, but that requires application changes and is not required on other platforms like Windows or Linux - i.e. bad for compatibility.
When MTL Heaps are set ON (now the default) should MoltenVK be returning either prefersDedicatedAllocation or requiresDedicatedAllocation as true in the structure VkMemoryDedicatedRequirements, or alternatively, should VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT be set by the app when using the VMA allocator?
Or is this a symptom of an underlying defect or issue with the current MTL Heaps implementation?
Someone can correct me if I’m wrong, but I’m pretty sure enabling dedicated allocations basically just bypasses using MTLHeap to create resources with dedicated memory using the old behavior.
I’m pretty sure enabling dedicated allocations basically just bypasses using MTLHeap
If that's the case, then perhaps we simply have a defect in MoltenVK when using MTLHeap with the device local memory segment for 2D textures on x86 + AMD GPUs.
As @squidbus indicates above, using dedicated allocations is only a band-aid for this issue, and not really the right solution.
I have looked a bit further and it seems the issue occurs when using memory sub-allocations (as enabled by VMA) for certain image formats with VK_IMAGE_TILING_OPTIMAL. In RBDoom3-BFG, the formats that cause negative interation with MTLHeap are VK_FORMAT_R8_UNORM and VK_FORMAT_R16G16_SFLOAT. However, if I change those specific image allocations to VK_IMAGE_TILING_LINEAR the issue disappears, even with dedicated allocations disabled.
I don't know enough about how MTLHeap and VMA's Vulkan sub-allocations interact to understand this issue. However, given my observation above, I suspect it may have something to do with granularity and/or padding when various image sub-allocations of different formats/usages participate in a larger allocation. VMA is supposed to manage this for you, but perhaps MTLHeap is interfering somehow.
Hopefully this additional info may provide a clue for a proper solution.
Last comment here. When looking at older version documentation (inside mvk_config.h), I see the following for the useMTLHeap (MVK_CONFIG_USE_MTLHEAP) configuration parameter:
* Apple recommends that MTLHeaps should only be used for specific requirements such as aliasing
* or hazard tracking, and MoltenVK testing has shown that allocating multiple textures of
* different types or usages from one MTLHeap can occassionally cause corruption issues under
* certain circumstances.
I suspect this is what I am seeing. As a result, I will be setting MVK_CONFIG_USE_MTLHEAP=0 for RBDoom3BFG. I have no need for VK_EXT_image_2d_view_of_3d in this application, so hopefully there will be no negative impact.
On a side note, why has detailed documentation for MoltenVK config parameters been removed from current versions? All I can find in current header files are one-liners which may not be that helpful.
Compiling from MoltenVK master (and all deps up-to-date), with Metal Heaps set to default ON I get graphics corruption:
When I set
MVK_CONFIG_USE_MTLHEAP = 0I get correct behaviour:
This is tested on an x86-based iMac + AMD 6600XT running the latest Ventura release (13.7.4). There definitely appears to be something wrong here, but I don't know how to debug this one.
I will try this again on my M1 Air and report back when I have results.
UPDATE: I was kind of expecting this, but it works fine with default setting ON on my MacBook M1 Air (also running Ventura). So, it looks like another x86 + AMD difference vs. M* GPUs. Does this help in any way for tracking it down?
@SRSaunders Sorry to join to ask a different question here, but the HUD in the screenshoot is builtin in RBDoom3-BFG or using a custom(system wide one)? In case using a generic one can point to info on it? Thanks..
@SRSaunders:
Yeah, I've actually known about this for a long time. And it is specific to AMD GPUs--it doesn't even happen to my knowledge on Intel GPUs, let alone Apple Silicon. It was the reason I flipped the default back to dedicated resources--when I added support for MTLHeap-backed memory, I wanted to change it to do that by default, but then I stumbled on this running the CTS on my AMD GPU MacBook Pro. I filed a feedback with Apple a long time ago, when I was still with CW, complaining about this, but so far, they've never bothered with fixing it. I can't imagine it's a high priority, given that their primary focus now is Apple Silicon.
Thanks @cdavis5e for the follow-up. Based on your answer, I guess my MVK_CONFIG_USE_MTLHEAP = 0 setting is the correct solution for AMD GPUs. And I agree that Apple will likely not fix this now given their focus on Apple Silicon.
However, perhaps you could add a note about this in the docs or header files somewhere, especially since MVK_CONFIG_USE_MTLHEAP = 1 is now set by default. Thanks.
Sorry to join to ask a different question here, but the HUD in the screenshoot is builtin in RBDoom3-BFG or using a custom(system wide one)? In case using a generic one can point to info on it? Thanks..
@oscarbg the HUD is an in-game solution using internal counters leveraging the Imgui framework for display.
Thanks @cdavis5e for the follow-up. Based on your answer, I guess my
MVK_CONFIG_USE_MTLHEAP = 0setting is the correct solution for AMD GPUs. And I agree that Apple will likely not fix this now given their focus on Apple Silicon.However, perhaps you could add a note about this in the docs or header files somewhere, especially since
MVK_CONFIG_USE_MTLHEAP = 1is now set by default. Thanks.
@SRSaunders @cdavis5e
Then, is a solution to have MoltenVK default to not use MTLHeaps when running on AMD?
If so, I can make that change. I certainly think it's better to have it work out of the box on AMD, than the current situation of crashing until the app works around it.
Since MVK_CONFIG_USE_MTLHEAP is instance and global, but AMD is physical device, I'll probably have to add a third "when safe" parameter option to MVK_CONFIG_USE_MTLHEAP, and make it the default. We've done that kind of thing before.
And I'll add back some explanatory documentation when documenting that new config option.
And do we also want to make any changes to prefersDedicatedAllocation and requiresDedicatedAllocation, to work better with VMA-style allocation usage? As you and @squidbus mentioned, it might have been a red herring that just happened to trigger an unexpectedly happy workaround, but based on the research here, is there anything meaningful we can do to better handle prefersDedicatedAllocation and requiresDedicatedAllocation?
Thanks @billhollings for responding on this. As you point out, there are two issues going on here: a) MTLHeaps on AMD GPUs (independent of the memory allocator being used), and b) how memory allocators (e.g. VMA, and possibly others) use responses from vkGetImageMemoryRequirements2KHR() and vkGetBufferMemoryRequirements2KHR() to allocate memory.
Then, is a solution to have MoltenVK default to not use MTLHeaps when running on AMD?
I think this is the simplest and most robust solution and what I have implemented manually as an override in RBDoom3BFG. Note that at least one sample fails on AMD in the Vulkan-Samples project (terrain_tessellation) because no manual override has been applied there. So having MoltenVK default to the correct behaviour on AMD would be a good thing.
And do we also want to make any changes to prefersDedicatedAllocation and requiresDedicatedAllocation, to work better with VMA-style allocation usage?
This is a bit trickier since if you implement the solution above, then changes here should not be required. However, if the changes above are not implemented, then forcing prefersDedicatedAllocation and requiresDedicatedAllocation to be true for AMD GPUs will cause VMA to ask for dedicated memory, and MoltenVK will then respond by providing a dedicated allocation which I believe avoids MTLHeaps (to be confirmed by you and @cdavis5e). So the end result is the same, but this latter approach depends on the allocator doing the right thing. You could still get into trouble if using manual allocations without an allocator like VMA and don't ask for dedicated allocations.
I have validated both of these solutions (for the second approach I simply set _requiresDedicatedMemoryAllocation = true inside MoltenVK for the test) . I would appreciate @cdavis5e commenting here since he is clearly more familiar with the root cause of this problem and should approve the strategy to fix.
PR #2509 fixes this by disabling MTLHeaps on AMD GPUs by default, while enabling MTLHeaps on other GPUs by default.
Please test latest MoltenVK and close this issue if the problem is resolved.
Tested using PR #2509 and the solution works fine against RBDoom3BFG and the Vulkan-Samples project.
Thanks @billhollings for fixing.