MoltenVK icon indicating copy to clipboard operation
MoltenVK copied to clipboard

New features available in Metal 4

Open squidbus opened this issue 6 months ago β€’ 35 comments

Metal 4 was just announced at WWDC 2025 with some new features that may be of interest to MoltenVK.

Updated feature set table: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf

Of particular interest:

  • Placement sparse textures and buffers
  • Command barriers

squidbus avatar Jun 09 '25 19:06 squidbus

This site seems to give a good overview of all new Metal 4 features and API changes: https://developer.apple.com/documentation/metal/understanding-the-metal-4-core-api The opening sentence of that website makes me happy somehow, since Apple acknowledges the existence of DX and Vulkan explicitly:

Metal 4 improves runtime performance and memory efficiency through its underlying implementations, while making it easier to adapt your apps and games from other platforms, such as DirectX and Vulkan.

  • Metal 4 is based on C++17, and allows lambdas
  • Most new types have been moved into a MTL4 namespace, and some newer versions of existing types have been moved here, e.g. MTL4RenderPass, or MTL4Pipeline.
  • Asynchronous compilation using MTL4Compiler
  • MTL4Archive looks like a replacement for MTLBinaryArchive, however this still seems to take in a URL instead of bytes, sadly.
  • Tensors, which are used with the MTL4MachineLearningPipeline
  • All of these types seem to make it easier to model the Vulkan command model on top of Metal, in every way shape and form. It gets rid of specialized encoders such as MTLBlitCommandEncoder and MTLAccelerationStructureCommandEncoder and fuses them together into one MTL4ComputeCommandEncoder.
  • The command buffer model also now more closely follows the Vulkan model, with a MTL4CommandBuffer, MTL4CommandQueue, and MTL4CommandAllocator, with the last serving effectively as a VkCommandPool, while the other two are VkCommandBuffer and VkQueue, respectively.
  • The concept of descriptor sets now exists with MTL4ArgumentTable, which avoids having to set the bindings every time you encode a command buffer
  • Automatic hazard tracking has been removed for all the MTL4 pipelines/command encoders. People now need to do perform manual synchronization using the barrier methods
  • Actual texture views using MTLTextureViewPool
  • Metal uses "macOS 16" and "iOS 19" everywhere lol

spnda avatar Jun 09 '25 19:06 spnda

This would be a great opportunity to clean up MoltenVK:

  • Remove support for x86 Macs (so all the managed resource mess can be removed)
  • Always use argument buffers
  • Remove support for automatic resource tracking
  • Move over to Metal 4
  • Always encode when the equivalent Vulkan functions get called rather than offering 3 different modes (encode immediately, encode in vkCmdEndCommandBuffer, encode at submission time)

I don't know how feasible this is for MoltenVK users though.

K0bin avatar Jun 09 '25 20:06 K0bin

I don’t know if non-Apple GPU needs to be supported further or we could just drop it and direct to the last supported version.

But for Metal 4, I think the challenge will be continuing to support a reasonable range of OS versions. So there may be some mess supporting both paths for a while.

squidbus avatar Jun 09 '25 20:06 squidbus

Also I wanted to mention that it seems like beyond residency sets there's no other option to convey residency information to the Metal driver, meaning residency sets are now required to be used with any of the MTL4 stuff.

Chiming in on the cleanup/rewrite thing: I could very well see the time coming where we drop support for Metal 1 and 2 and Intel GPUs in a newer version, but keep support for Metal 3. All Apple Silicon chips on macOS will support Metal 4, so it's only a question of supporting old mobile devices.

As I mentioned before, I am generally very much for a MoltenVK that is based on the newest Metal version by default, and only falls back to using Metal 3 (or older) for OSes or devices that do not support these. This should allow us to be able to provide the best possible performance and compatibility for up-to-date devices. Especially in this case, I believe if we do not switch to the new Metal 4 command and binding APIs, MoltenVK could have major performance issues from using the older APIs, especially on any new OSs or Apple GPUs that are likely tailored towards this API over the old ones. Furthermore, the day when Metal 1 and Metal 2 are deprecated or even unusable is probably not too far away given the recent reworks of Metal 3 and Metal 4. I think it's important to remember that Metal 1 is from 2014 and carries a lot of technical debt with it.

spnda avatar Jun 09 '25 21:06 spnda

Metal 3 sounds like a reasonable target to me at this point. Since you seem to have looked into the new APIs a fair bit, do you have an idea of how challenging it will be to support both Metal 3 and 4 in the same codebase? In terms of how much logic can be reused between the two versus completely separate code paths and resource management, for example. As with all the new structures and ways of doing things more aligned with other APIs I’m not sure how much overlap there is now.

squidbus avatar Jun 09 '25 21:06 squidbus

do you have an idea of how challenging it will be to support both Metal 3 and 4 in the same codebase

Metal 4 has new protocols for:

  • MTL4CommandQueue replaces MTLCommandQueue
  • MTL4CommandBuffer replaces MTLCommandBuffer
  • MTL4RenderCommandEncoder replaces MTLRenderCommandEncoder
  • MTL4ComputeCommandEncoder replaces MTLComputeCommandEncoder, MTLBlitCommandEncoder, MTLAccelerationStructureCommandEncoder.

So it would come with significant maintenance overhead to have two code paths for all of that. Probably easier and cleaner to even keep those in different code bases or branches.

K0bin avatar Jun 09 '25 23:06 K0bin

Ah, I kind of expected the new versions to at least subclass the old versions, but it appears not.

squidbus avatar Jun 09 '25 23:06 squidbus

Should be doable though as long as you'd go all-in on Argument buffers. That would be shared between the Metal 3 command buffer implementation and the Metal 4 one.

Besides that MoltenVK could easily make up for the extra maintenance overhead by removing the encode-at-endCmdBuffer and encode-at-submit modes + all the command serialization.

K0bin avatar Jun 09 '25 23:06 K0bin

Related to the Intel GPU conversation, it appears that in the Platform State of the Union, the announced that Tahoe would be the final major update for any Intel Mac.

squidbus avatar Jun 09 '25 23:06 squidbus

There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.

cdavis5e avatar Jun 10 '25 00:06 cdavis5e

There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.

The docs seem to indicate you can mix MTL and MTL4 usage to incrementally migrate, so I assume at worst you could still fall back to older constructs to build a tessellation pipeline, while still using newer stuff where possible.

squidbus avatar Jun 10 '25 00:06 squidbus

The docs seem to indicate you can mix MTL and MTL4 usage to incrementally migrate, so I assume at worst you could still fall back to older constructs to build a tessellation pipeline, while still using newer stuff where possible.

You'd have to end the Metal 4 command buffer, start a Metal 3 command buffer, do the tessellated draw, end the Metal 3 command buffer, start a new Metal 4 command buffer.

K0bin avatar Jun 10 '25 00:06 K0bin

There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.

Finish up?

Has it been started somewhere? πŸ‘€

billhollings avatar Jun 10 '25 00:06 billhollings

I'm wondering if the new flexible pipeline state feature would allow us to get an approximation of VK_EXT_graphics_pipeline_library or even VK_EXT_shader_object? That would be really helpful.

From my quick scan, in both cases, it seems the main sticking point will mostly be the vertex input state info content in VkPipelineVertexInputStateCreateInfo.

And it looks like final compiled pipelines are always required, so we'd be looking at some kind of internal hash-n-cache design of fully-compiled pipelines. But at least the final compilations would be much faster, and cause fewer hitches.

billhollings avatar Jun 10 '25 01:06 billhollings

Thinking out loud on the topic of backwards compatibility...

Metal4 is supported on all Apple Silicon Macs, and as far back as A14 on iOS, which is pretty impressive. And non-AS Macs are now 5 years old. So I don't think hardware is an issue.

Some level of backwards compatibility would be required to support older OS's for the next several years I think. Although game and engine devs could certainly opt to abandon their customers who don't upgrade.

The good news is that the overall class structure within MoltenVK is effectively based on Vulkan, not Metal, which gives us a couple of options.

  1. My first choice would be to keep one main branch, and create *Metal4 subclasses of key objects such as MVKQueue, MVKGraphicsPipeline, MVKCommandEncoder, and maybe MVKDescriptorSet.

    We'd have to require that any improvements and Vulkan extensions and that don't explicitly need Metal4 tech be added to the parent subclasses.

    And maybe its a MVKConfiguration option to force not to use Metal4, say for an app that needs tessellation shader support.

  2. As a second choice (and its a much lower preference for me), we could possibly spawn a Metal4 branch, on which the underlying Metal objects are replaced with Metal4 objects, but would still be generally synchronizable with the main branch.

    We'd have to require that any improvements and Vulkan extensions and that don't explicitly need Metal4 support be added to the main branch, constantly sync main into Metal4, and make the Metal4 branch only about Metal objects.

    And as @cdavis5e points out, to support tessellation shaders, we'd need to figure out mesh shaders as part of it, and build mesh shaders from tessellation shaders. Or maybe that's part of legacy management choice between the two branches that an app dev makes.

    And at some point (years) down the road, when we don't care about legacy at all, we could swap the branches, and rename main to legacy, and Metal4 to main.

billhollings avatar Jun 10 '25 02:06 billhollings

Adding on to the thread about metal 3: I think it's perfectly reasonable to just support 3 & 4 (and abandon ancient versions), especially given that metal 3 is supported on devices going back all the way to 2017 (such as my imac18,3).

Though abadoning metal 3 seems like an unwise choice, especially if it forces devs to have to use an outdated version of moltenVK that supports metal 3, thereby incurring a performance loss.

vaylor27 avatar Jun 10 '25 02:06 vaylor27

@billhollings

I'm wondering if the new flexible pipeline state feature would allow us to get an approximation of VK_EXT_graphics_pipeline_library or even VK_EXT_shader_object? That would be really helpful.

I think the more interesting way to do those would be to compile MTLFunction when compiling the individual pipeline libraries or shader objects.

Then for the final linked pipeline you have a VS shader and a FS shader that each do nothing except call the dynamically linked functions. That way the final compilation step should still be really fast and you'd decouple vertex shaders from fragment shaders (which is is usually more of a problem than additional pipeline state surrounding it)

For shader objects, you'd have to manually compile a statically linked version on a background thread, with VK_EXT_gpl thats the job of the application.

And non-AS Macs are now 5 years old. So I don't think hardware is an issue.

This years OS update is also the last one for x86 Macs.

Good time to remove the fallback to managed resources. I just hope Apple updates the iOS simulator to no longer be stupid like that for absolutely no reason.

My first choice would be to keep one main branch, and create *Metal4 subclasses of key objects such as MVKQueue, MVKGraphicsPipeline, MVKCommandEncoder, and maybe MVKDescriptorSet.

Isn't that like 60% of MoltenVK? The argument buffer implementation of descriptor sets in MVKDescriptorSet is actually the thing that could be shared besides MVKImage and MVKBuffer.

K0bin avatar Jun 10 '25 03:06 K0bin

Adding on to the thread about metal 3: I think it's perfectly reasonable to just support 3 & 4 (and abandon ancient versions), especially given that metal 3 is supported on devices going back all the way to 2017 (such as my imac18,3).

Though abadoning metal 3 seems like an unwise choice, especially if it forces devs to have to use an outdated version of moltenVK that supports metal 3, thereby incurring a performance loss.

Agreed. My preference above is for option (1), subclassing, which would allow us to keep support for Metal 3.

Even with my option (2), continuing to focus any non-Metal4 code in main branch, would allow Metal3 support to continue to evolve.

billhollings avatar Jun 10 '25 12:06 billhollings

The concept of descriptor sets now exists with MTL4ArgumentTable, which avoids having to set the bindings every time you encode a command buffer

I think MTL4ArgumentTable this is the Argument Buffer for Metal 2.0

Andreyogld3d avatar Jun 10 '25 15:06 Andreyogld3d

I think MTL4ArgumentTable this is the Argument Buffer for Metal 2.0

No, it's the new way to store the slot bindings and apply multiple at once. Argument Buffers are still used for bindless. I think for MoltenVK the way forward would be to turn descriptor sets into Argument Buffers and bind them to an Argument Table.

The annoying thing is that apparently they got rid of push constants.

K0bin avatar Jun 10 '25 15:06 K0bin

I think for MoltenVK the way forward would be to turn descriptor sets into Argument Buffers and bind them to an Argument Table.

I would agree that this should be the path forward. Since in Vulkan the descriptor set bindings are tied to the command buffer, we'd need to keep around an argument table per pipeline per command buffer, and then map the descriptor set bindings to that argument table. I think this is very simple, and I guess with some caching mechanism for each pipeline layout that could be improved upon again, so that memory isn't allocated constantly. I could see this being a lot more efficient than the current approach.

The annoying thing is that apparently they got rid of push constants.

Not really, since it was just a buffer that was managed by the driver itself, and just a constant T* in the shader. That can be easily emulated by effectively doing the same the Metal driver did before, and keeping around small buffers that are always bound to the last (I think that's what we used?) entry in the table.

I genuinely want to work on a branch that reworked the entire command queue/encoding stuff to use MTL4 stuff by default, and fallback to the rest, though I simply just don't have time to work on this in any capacity, except for maybe testing out other people's ideas or PRs.

My first choice would be to keep one main branch, and create *Metal4 subclasses of key objects such as MVKQueue, MVKGraphicsPipeline, MVKCommandEncoder, and maybe MVKDescriptorSet.

As I said in my first comment on this issue, this feels a little backwards to me. I doubt that Apple will rework the entire queue, command buffer and command encoding interface any time soon, and given that the old system will probably rarely be used going forward, I think we should use the Metal 4 implementations by default assuming they're available. The naming is fine, I guess, however the old functionality should probably be implemented as specializations behind the Metal 4 stuff as a fallback imho.

spnda avatar Jun 10 '25 17:06 spnda

Does all this means that MoltenVK will implement VK_EXT_Descriptor_Buffer and VK_EXT_Shade_Object anytime soon?

QuantumDeveloper avatar Jun 10 '25 17:06 QuantumDeveloper

Does all this means that MoltenVK will implement VK_EXT_Descriptor_Buffer

The problem with VK_EXT_descriptor_buffer was never the descriptor API, as discussed in #1776. The problem was figuring out residency, however I think a full residency set based approach would address that fairly well. However, I will mention that MTL4ArgumentTable does make implementing descriptor buffer a lot easier since it allows binding buffers by their address instead of their handle. Therefore the address <-> buffer handle book keeping can be completely removed. If I remember correctly, this was also a problem with the implementation for VK_KHR_acceleration_structures.

I guess this would be another big reason to have a Metal 4 centric implementation, since it makes VK_EXT_descriptor_buffer a lot easier to implement, and probably allows other descriptor extensions to be implemented more easily or at all. Again I want to strongly advocate for a Metal 4 rework of the command encoding, which will also force us to only rely on residency sets which should be a lot more optimal as Apple have said themselves.

I feel like the current design philosophy of keeping backwards compatibility as much as possible is tying our feet more than it actually has any gains. @billhollings I would be very interested in some kind of statistics that show what OS versions and what GPUs are most commonly used together with MoltenVK. It's been 5 years since M1, and even longer for the A-series, so I would guess that from a market share perspective these carry probably around 80% of users? There's a point where keeping backwards compatibility is just wasteful. stares at Windows.

spnda avatar Jun 10 '25 18:06 spnda

Given MoltenVK is being rewritten right now anyway, this seems like a sensible split for "classic" moltenvk (this repo) to be metal 3 and keep support for older hw and for the new implementation to support only Metal 4.

alyssarosenzweig avatar Jun 10 '25 19:06 alyssarosenzweig

It's being rewritten?

K0bin avatar Jun 10 '25 19:06 K0bin

aren't the mesa guys wanting to rewrite it?

vaylor27 avatar Jun 10 '25 20:06 vaylor27

There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.

Finish up?

Has it been started somewhere? πŸ‘€

CodeWeavers has their implementation of geometry shaders, which is languishing (see KhronosGroup/SPIRV-Cross#2200). And, I have an incomplete implementation of vertex shaders on top of mesh shaders, which still needs testing and fleshing out.

cdavis5e avatar Jun 10 '25 21:06 cdavis5e

@spnda

I doubt that Apple will rework the entire queue, command buffer and command encoding interface any time soon, and given that the old system will probably rarely be used going forward, I think we should use the Metal 4 implementations by default assuming they're available.

I'm not sure I understand what you're saying, or what this has to do with my suggestion of a design that will use subclassing of key MoltenVK objects. Can you elaborate on your concerns? The "assuming they're available" part is what I'm trying to address with a subclass design.

The problem with VK_EXT_descriptor_buffer was never the descriptor API, as discussed in https://github.com/KhronosGroup/MoltenVK/issues/1776. The problem was figuring out residency, however I think a full residency set based approach would address that fairly well.

Full residency exists in MoltenVK already. It was added a little while ago.

It's been 5 years since M1, and even longer for the A-series, so I would guess that from a market share perspective these carry probably around 80% of users? There's a point where keeping backwards compatibility is just wasteful. stares at Windows.

As I said above, I don't believe legacy is about hardware, just OS versions. MoltenVK is used by game engines, emulators, a lot of games, and significant non-game software like Autodesk. All those devs will probably not be in a situation to tell their users to "upgrade to *OS 26, or die", for at least the next several years. πŸ˜‰

@Kobin

It's being rewritten?

@vaylor27

aren't the mesa guys wanting to rewrite it?

What @alyssarosenzweig said is misleading. She means that Mesa is developing an Apple-Silicon-only Metal back end. The Mesa Metal back-end is under development, and is expected sometime at the end of this year. It is not connected with MoltenVK at all, and will be a second Vulkan-on-Metal option. Her assumption is that it will make MoltenVK obsolete, except for Intel hardware (and maybe Metal3).

billhollings avatar Jun 10 '25 22:06 billhollings

Just sharing my 2 cents..

As a user i would be leaning towards a dxvk approach. A MoltenVK 1.x branch and a MoltenVK 2.x . Ideally stuff implemented in one could be transferred to the other.

As a developer, someone who has dealt with similar stuff before, that idea is very nice but prone to bugs and will put more work on a thin team.

My question therefore is: would that be an acceptable approach for MoltenVK clients? "Closing" MoltenVK as is and moving towards a new version? Can that version be implemented in a time acceptable by the clients?

EDIT: @billhollings Not exactly related to Metal 4 but i see a lot of open Issues marked as "Completed" that have the person that opened them saying its fixed or with no reply for over 2 years. Can't those be closed?

rcaridade145 avatar Jun 11 '25 15:06 rcaridade145

Ignoring any potential Mesa layer, as far as MoltenVK is concerned, I agree with @rcaridade145 that there should be a split between the current 1.x MoltenVK based on Metal 3, and an entirely new branch 2.x based on Metal 4. The potential amount of overhead savings and cleaner code that could be gained with a Metal 4 only approach cannot be ignored, but attempting to blend it with Metal 3 will only add technical debt for the future.

IsaacMarovitz avatar Jun 13 '25 18:06 IsaacMarovitz