MoltenVK icon indicating copy to clipboard operation
MoltenVK copied to clipboard

FFXIV fails to run on Mac with > 100GB RAM

Open cbackas42 opened this issue 2 years ago • 26 comments

This is a fairly weird issue, so please bear with me. Final Fantasy XIV exhibits the following behavior:

  • Upon startup, it tries to allocate a very large surface from Metal. I don't know the full details of this surface (or if it's a single one or multiple), but I do know that it's 16Kx16K with 2048 Layers - the total allocation ends up being somewhere around 80GB of RAM.
  • On most Macs this allocation fails, having insufficient RAM available to fulfill the request. MVK reports the command buffer failure and things continue.
  • On Macs with sufficient RAM (Studio Ultra 128GB for instance) the allocation succeeds, and the game proceeds to run some vertex shader on this surface. Said shader takes > 12ms to process a single vertex, so the firmware restarts the GPU.

This doesn't seem like it's MVK's fault per-se, but we're looking for solutions to work around it. Maybe a build setting or ENV setting to limit the max size of surfaces - the thinking is if we can induce the failure we see from lower-RAM machines the game would work on the higher-RAM machines, even if the defect is in the game itself.

Can you suggest places in the MVK code base I could target to try out hacks? I have actual hardware to test on.

cbackas42 avatar Jul 16 '22 23:07 cbackas42

I have to say...this is pretty bizarre. 😲

Aside from the fundamental question of why the heck this is happening in a game...from my calculations, the math doesn't seem to make sense. A surface of 16K x 16k x 2k is 512 Gigapixels. You don't indicate which surface format you're using, but any format has got to be way beyond even your 128GB memory availability too.

Typically, the surface itself is created from an existing CAMetalLayer (or more accurately one per swapchain image). So this allocation will be happening outside MoltenVK anyway. Plus it's hard to see how that could be a 3D texture.

Can you get a handle on where this initial giant surface is coming from? Surely it's a bug in the game itself, or something?

billhollings avatar Jul 19 '22 19:07 billhollings

Yup. Pretty bizarre is about right!

I agree the math doesn't make a ton of sense. All I know from debugging the shader timeout was that those are the dimensions of the surface it was operating on at the time, and also that I can see ~80GB of RAM become "Wired" by the metal driver around this time.

I'm not super familiar with how MVK works, in this case it's DXVK->MVK so I assumed all the "Metal" allocations would happen at the MVK layer. You're saying it may in fact be DXVK creating the surface?

I agree it's probably a game bug. Or, perhaps they're doing something that semantically makes SOME sense in DirectX and doesn't cause giant allocations, but one of the translation layers has a different interpretation. I'm not sure at this point. As a first step I was looking for a spot I could try to add a hack to cause the allocation to fail like it does on the lower-spec machines so that the game could launch.

cbackas42 avatar Jul 19 '22 20:07 cbackas42

You could try inserting a check for an unreasonable value of pCreateInfo->imageExtent prior to this line:

https://github.com/KhronosGroup/MoltenVK/blob/60b2ae51ddb87617beb6d8cb7fac11e1daed763e/MoltenVK/MoltenVK/GPUObjects/MVKSwapchain.mm#L345

and set something like:

setConfigurationResult(reportError(VK_ERROR_OUT_OF_HOST_MEMORY, "vkCreateSwapchainKHR(): Swapchain surface size (%d, %d) is unsupportably large.", pCreateInfo-> imageExtent.width, pCreateInfo-> imageExtent.height));

billhollings avatar Jul 27 '22 22:07 billhollings

The issue is that when FFXIV draws without render targets, wined3d, and I believe DXVK, create a render pass with maximum possible render area and framebuffer sizes, with the understanding that actual pixel grid will be determined by the viewport. Apple Silicon GPU's, due to their tiled architecture, nevertheless need to preallocate large amounts of tile memory in this case. For CrossOver, we worked around this by trimming the render target size in the render pass to what is required by the current viewport. It seemed to make more sense to do it at the level of D3D, but I suppose MoltenVK could implement a similar workaround.

js6i avatar Aug 03 '22 15:08 js6i

The issue is that when FFXIV draws without render targets, wined3d, and I believe DXVK, create a render pass with maximum possible render area and framebuffer sizes, with the understanding that actual pixel grid will be determined by the viewport. Apple Silicon GPU's, due to their tiled architecture, nevertheless need to preallocate large amounts of tile memory in this case. For CrossOver, we worked around this by trimming the render target size in the render pass to what is required by the current viewport. It seemed to make more sense to do it at the level of D3D, but I suppose MoltenVK could implement a similar workaround.

The is interesting information! If the fix is better done in DXVK that's fine; but, what version of CrossOver works around this? I ask because the Square Enix "official" launcher has the same bug present, so presumably not that one. Is it in recent CrossOver releases? Is the patch available or submitted upstream?

cbackas42 avatar Aug 03 '22 18:08 cbackas42

Sorry, it's very recent and not officially shipped anywhere yet. It's to be included in the next CrossOver release. Probably too hacky to go to upstream, in the current form anyway.

js6i avatar Aug 04 '22 06:08 js6i

Sounds good, I look forward to that change appearing at some point!

I tried @billhollings suggestion above with selectively called setConfigurationResult() with an error, but it just resulted in "A DirectX Error has occurred". It's unclear how this error gets handled gracefully on lower spec machines in the first place, but I suspect it's a very narrowly targeted hack someplace.

I wonder though if someone could suggest a hack for MVK that's along the lines of what @js6i is suggesting. Because, if you could do the same thing in MVK it should solve this problem for both WineD3D and DXVK at once.

cbackas42 avatar Aug 05 '22 21:08 cbackas42

[redacted] That sounds more like a Wine/CrossOver issue. Have you contacted CodeWeavers support? (UPDATE: Edited to redact deactivated username at user's request.)

cdavis5e avatar Nov 17 '22 21:11 cdavis5e

Update here; for our purposes in FFXIV we were able to fix this with a small hack to DXVK, the diff of which can be found here: https://github.com/Gcenx/DXVK-macOS/pull/3

The hack itself was "inspired" by a similar change CX had done in WineD3D for the same reason. It's unclear to me whether this indicates something MVK could do on behalf of its clients so they didn't all need a change - since as an outsider at least it seems like "This is needed only for Metal devices" should land more in MVK's court. But it's very likely I don't know what I'm talking about either so I'll leave whether to close this or not to you folks who know the stacks better.

cbackas42 avatar Dec 06 '22 00:12 cbackas42

Thinking about how this could be handled in MoltenVK...

@js6i

The issue is that when FFXIV draws without render targets, wined3d, and I believe DXVK, create a render pass with maximum possible render area and framebuffer sizes, with the understanding that actual pixel grid will be determined by the viewport. Apple Silicon GPU's, due to their tiled architecture, nevertheless need to preallocate large amounts of tile memory in this case. For CrossOver, we worked around this by trimming the render target size in the render pass to what is required by the current viewport. It seemed to make more sense to do it at the level of D3D, but I suppose MoltenVK could implement a similar workaround.

  1. To do this, are you trimming the render target size in vkCmdBeginRenderPass()? If not, how then?
  2. Under what conditions are you doing this? When the frame buffer has no attachments?
  3. How is the viewport size known at trimming time? Pipelines and viewports can be bound inside a renderpass.

@cbackas42

for our purposes in FFXIV we were able to fix this with a small hack to DXVK

The fix you identify sets VkFramebufferCreateInfo::layers to 1 when there are no attachments. MoltenVK could certainly do this logic internally, but presumably this will still allocate a significant chunk of unused memory to cover one very large texture of only one layer (as opposed to I guess the 2048 layers above).

billhollings avatar Dec 09 '22 16:12 billhollings

@billhollings you can view there hack via my mirror see wined3d/context_vk.c

Gcenx avatar Dec 09 '22 16:12 Gcenx

you can view there hack via my mirror

Thanks. A similar approach to @cbackas42, but also limiting render area.

The concern I have about putting this into MoltenVK would be the question I have above about generically trimming to whatever viewport is active at the time the render pass begins. It's possible no that no pipeline or viewport is established at the time this decision is taken, or that a different viewport will be established later in the renderpass. Ordering renderpass, pipelines, and viewports is something the app (or DXVK) might have control over, but MoltenVK doesn't.

However...I wonder if MoltenVK could use attachment status to control setting the values of MTLRenderPassDescriptor properties renderTargetWidth, renderTargetHeight, and renderTargetArrayLength, which I assume may be what Metal uses to allocate the tile memory cache.

billhollings avatar Dec 09 '22 18:12 billhollings

but presumably this will still allocate a significant chunk of unused memory to cover one very large texture of only one layer (as opposed to I guess the 2048 layers above).

Yes, this is very likely true. I'm not presenting our fix as "the best" or even "correct", just what worked for us for our target game. None of us are very familiar with the graphics APIs here - we changed the layer count because it was easily accessed and we couldn't figure out how to determine the viewport size from that spot to more closely mimic the CX hack.

But I wouldn't present this as a good "fix" to upstream or anything without know for certain that there wouldn't be a valid application of > 1 layer without attachments. I rather suspect our fix would break some programs.

But the root idea of course is that "default to the largest possible size" isn't especially safe on Apple Silicon because it might ACTUALLY try to allocate that!

cbackas42 avatar Dec 09 '22 19:12 cbackas42

However...I wonder if MoltenVK could use attachment status to control setting the values of MTLRenderPassDescriptor properties renderTargetWidth, renderTargetHeight, and renderTargetArrayLength, which I assume may be what Metal uses to allocate the tile memory cache.

@cbackas42 @js6i @Gcenx

With that in mind, can someone try applying the following small patch to MoltenVK to see if it fixes the tiling memory over-allocation, please?

no_preset_render_size.patch.zip

The patch is just commenting out the following 4 lines in MVKCommandBuffer.mm, so that Metal is not pre-warned how big the rendering area is. I'm hoping this changes how it allocates tiling memory. If it works, I'll modify it to only do so when there are no attachments in the framebuffer.

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L526-L528

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L552

billhollings avatar Dec 09 '22 19:12 billhollings

Thanks Bill; I went back to an older release of XIV On Mac with a DXVK prior to when we'd applied our patch, and dropped in an MVK built with your patches. It appears to be just as effective, at least for FFXIV. The game launches and works without issue, no giant memory allocation or kernel panic!

cbackas42 avatar Dec 10 '22 07:12 cbackas42

This is now fixed in PR #1797. Please retest using latest MoltenVK, and close this issue if it fixes the problem.

billhollings avatar Dec 12 '22 00:12 billhollings

Tested on a Studio Ultra 128GB on the unpatched DXVK; the problem appears to be solved! Thank you so much!

cbackas42 avatar Dec 12 '22 01:12 cbackas42

@cbackas42

Thanks for testing. However, it turns out that the fix in #1797 causes the frag shader not to run.

I'm curious. What is going on in FFXIV On Mac during these attachment-free renders, and why does the applied fix not cause behavioral issues if the frag shader is not being run?

billhollings avatar Dec 13 '22 19:12 billhollings

Oh no! It was too good to be true...

The game is doing early startup, with no fixes in place it causes the giant allocation prior to drawing anything at all, even before the first studio logo appears. If you have say, DXVK overlays on you do see the very first frame of that, so I believe it dies either during or after the very first frame drawn period. My assumption has always been that it actually configures this thing later, just lazily but I'm off into wild guesses at that point.

cbackas42 avatar Dec 13 '22 19:12 cbackas42

However...I wonder if MoltenVK could use attachment status to control setting the values of MTLRenderPassDescriptor properties renderTargetWidth, renderTargetHeight, and renderTargetArrayLength, which I assume may be what Metal uses to allocate the tile memory cache.

@cbackas42 @js6i @Gcenx

With that in mind, can someone try applying the following small patch to MoltenVK to see if it fixes the tiling memory over-allocation, please?

no_preset_render_size.patch.zip

The patch is just commenting out the following 4 lines in MVKCommandBuffer.mm, so that Metal is not pre-warned how big the rendering area is. I'm hoping this changes how it allocates tiling memory. If it works, I'll modify it to only do so when there are no attachments in the framebuffer.

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L526-L528

https://github.com/KhronosGroup/MoltenVK/blob/a307b24001b0b4cc36ca3fcdb28dca135e95c280/MoltenVK/MoltenVK/Commands/MVKCommandBuffer.mm#L552

@billhollings @cdavis5e & @js6i does the issue also happen if this attached change is applied instead of #1797

Gcenx avatar Dec 13 '22 19:12 Gcenx

does the issue also happen if this attached change is applied instead of #1797

Unfortunately, yes, the issue will happen. Both are actually the same fix. #1797 is just an industrial version of that patch (applying it only if there are no attachments).

billhollings avatar Dec 13 '22 19:12 billhollings

does the issue also happen if this attached change is applied instead of #1797

Unfortunately, yes, the issue will happen. Both are actually the same fix. #1797 is just an industrial version of that patch (applying it only if there are no attachments).

Ah I misunderstood nvm, I’ll remove the revert on the DXVK-macOS.

Gcenx avatar Dec 13 '22 20:12 Gcenx

@Gcenx

Ah I misunderstood nvm, I’ll remove the revert on the DXVK-macOS.

I've pushed PR #1802, which does some more sophisticated management of Metal renderpasses, and fixes the issue here in a way that always sets the Metal render area to something: the frame buffer render area, or the viewport, if there are no attachments. Since the viewport is not guaranteed until the draw call, we now defer creating the Metal renderpass until then (or until needed by other renderpass operations that involve drawing, like clearing attachments).

Can you test this again with the new code, and without the DXVK-macOS fix, and let me know the results, please?

billhollings avatar Dec 22 '22 05:12 billhollings

Sorry for the delay; I just tried the latest build of the current version of the PR, and it seems to work just as well on my 128GB Studio Ultra with FFXIV as the initial change did.

cbackas42 avatar Dec 23 '22 22:12 cbackas42

Sorry for the delay; I just tried the latest build of the current version of the PR, and it seems to work just as well on my 128GB Studio Ultra with FFXIV as the initial change did.

Thanks for following up on that. And no worries about the pause.

To any reading this discussion, please note that after two attempts, MoltenVK does not have a fix for this. After some discussion, PR ##1797 has been reversed, as it breaks required behavior, and PR #1802 is on hold as WIP at this point, as it is a complicated solution to a problem that it seems should be better handled at the app or emulator level, as was done above.

billhollings avatar Dec 27 '22 18:12 billhollings

For the big boss here, can you delete my old account's posts here? [ghost] the images I posted have sensitive info, and when people type my name on google, they end up seeing these images...woops 😥

I've deleted the two ghost posts above.

billhollings avatar Apr 23 '24 20:04 billhollings