New features available in Metal 4
Metal 4 was just announced at WWDC 2025 with some new features that may be of interest to MoltenVK.
Updated feature set table: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
Of particular interest:
- Placement sparse textures and buffers
- Command barriers
This site seems to give a good overview of all new Metal 4 features and API changes: https://developer.apple.com/documentation/metal/understanding-the-metal-4-core-api The opening sentence of that website makes me happy somehow, since Apple acknowledges the existence of DX and Vulkan explicitly:
Metal 4 improves runtime performance and memory efficiency through its underlying implementations, while making it easier to adapt your apps and games from other platforms, such as DirectX and Vulkan.
- Metal 4 is based on C++17, and allows lambdas
- Most new types have been moved into a
MTL4namespace, and some newer versions of existing types have been moved here, e.g.MTL4RenderPass, orMTL4Pipeline. - Asynchronous compilation using
MTL4Compiler MTL4Archivelooks like a replacement forMTLBinaryArchive, however this still seems to take in a URL instead of bytes, sadly.- Tensors, which are used with the
MTL4MachineLearningPipeline - All of these types seem to make it easier to model the Vulkan command model on top of Metal, in every way shape and form. It gets rid of specialized encoders such as
MTLBlitCommandEncoderandMTLAccelerationStructureCommandEncoderand fuses them together into oneMTL4ComputeCommandEncoder. - The command buffer model also now more closely follows the Vulkan model, with a
MTL4CommandBuffer,MTL4CommandQueue, andMTL4CommandAllocator, with the last serving effectively as aVkCommandPool, while the other two areVkCommandBufferandVkQueue, respectively. - The concept of descriptor sets now exists with
MTL4ArgumentTable, which avoids having to set the bindings every time you encode a command buffer - Automatic hazard tracking has been removed for all the
MTL4pipelines/command encoders. People now need to do perform manual synchronization using thebarriermethods - Actual texture views using
MTLTextureViewPool - Metal uses "macOS 16" and "iOS 19" everywhere lol
This would be a great opportunity to clean up MoltenVK:
- Remove support for x86 Macs (so all the managed resource mess can be removed)
- Always use argument buffers
- Remove support for automatic resource tracking
- Move over to Metal 4
- Always encode when the equivalent Vulkan functions get called rather than offering 3 different modes (encode immediately, encode in vkCmdEndCommandBuffer, encode at submission time)
I don't know how feasible this is for MoltenVK users though.
I donβt know if non-Apple GPU needs to be supported further or we could just drop it and direct to the last supported version.
But for Metal 4, I think the challenge will be continuing to support a reasonable range of OS versions. So there may be some mess supporting both paths for a while.
Also I wanted to mention that it seems like beyond residency sets there's no other option to convey residency information to the Metal driver, meaning residency sets are now required to be used with any of the MTL4 stuff.
Chiming in on the cleanup/rewrite thing: I could very well see the time coming where we drop support for Metal 1 and 2 and Intel GPUs in a newer version, but keep support for Metal 3. All Apple Silicon chips on macOS will support Metal 4, so it's only a question of supporting old mobile devices.
As I mentioned before, I am generally very much for a MoltenVK that is based on the newest Metal version by default, and only falls back to using Metal 3 (or older) for OSes or devices that do not support these. This should allow us to be able to provide the best possible performance and compatibility for up-to-date devices. Especially in this case, I believe if we do not switch to the new Metal 4 command and binding APIs, MoltenVK could have major performance issues from using the older APIs, especially on any new OSs or Apple GPUs that are likely tailored towards this API over the old ones. Furthermore, the day when Metal 1 and Metal 2 are deprecated or even unusable is probably not too far away given the recent reworks of Metal 3 and Metal 4. I think it's important to remember that Metal 1 is from 2014 and carries a lot of technical debt with it.
Metal 3 sounds like a reasonable target to me at this point. Since you seem to have looked into the new APIs a fair bit, do you have an idea of how challenging it will be to support both Metal 3 and 4 in the same codebase? In terms of how much logic can be reused between the two versus completely separate code paths and resource management, for example. As with all the new structures and ways of doing things more aligned with other APIs Iβm not sure how much overlap there is now.
do you have an idea of how challenging it will be to support both Metal 3 and 4 in the same codebase
Metal 4 has new protocols for:
- MTL4CommandQueue replaces MTLCommandQueue
- MTL4CommandBuffer replaces MTLCommandBuffer
- MTL4RenderCommandEncoder replaces MTLRenderCommandEncoder
- MTL4ComputeCommandEncoder replaces MTLComputeCommandEncoder, MTLBlitCommandEncoder, MTLAccelerationStructureCommandEncoder.
So it would come with significant maintenance overhead to have two code paths for all of that. Probably easier and cleaner to even keep those in different code bases or branches.
Ah, I kind of expected the new versions to at least subclass the old versions, but it appears not.
Should be doable though as long as you'd go all-in on Argument buffers. That would be shared between the Metal 3 command buffer implementation and the Metal 4 one.
Besides that MoltenVK could easily make up for the extra maintenance overhead by removing the encode-at-endCmdBuffer and encode-at-submit modes + all the command serialization.
Related to the Intel GPU conversation, it appears that in the Platform State of the Union, the announced that Tahoe would be the final major update for any Intel Mac.
There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.
There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.
The docs seem to indicate you can mix MTL and MTL4 usage to incrementally migrate, so I assume at worst you could still fall back to older constructs to build a tessellation pipeline, while still using newer stuff where possible.
The docs seem to indicate you can mix MTL and MTL4 usage to incrementally migrate, so I assume at worst you could still fall back to older constructs to build a tessellation pipeline, while still using newer stuff where possible.
You'd have to end the Metal 4 command buffer, start a Metal 3 command buffer, do the tessellated draw, end the Metal 3 command buffer, start a new Metal 4 command buffer.
There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.
Finish up?
Has it been started somewhere? π
I'm wondering if the new flexible pipeline state feature would allow us to get an approximation of VK_EXT_graphics_pipeline_library or even VK_EXT_shader_object? That would be really helpful.
From my quick scan, in both cases, it seems the main sticking point will mostly be the vertex input state info content in VkPipelineVertexInputStateCreateInfo.
And it looks like final compiled pipelines are always required, so we'd be looking at some kind of internal hash-n-cache design of fully-compiled pipelines. But at least the final compilations would be much faster, and cause fewer hitches.
Thinking out loud on the topic of backwards compatibility...
Metal4 is supported on all Apple Silicon Macs, and as far back as A14 on iOS, which is pretty impressive. And non-AS Macs are now 5 years old. So I don't think hardware is an issue.
Some level of backwards compatibility would be required to support older OS's for the next several years I think. Although game and engine devs could certainly opt to abandon their customers who don't upgrade.
The good news is that the overall class structure within MoltenVK is effectively based on Vulkan, not Metal, which gives us a couple of options.
-
My first choice would be to keep one
mainbranch, and create*Metal4subclasses of key objects such asMVKQueue,MVKGraphicsPipeline,MVKCommandEncoder, and maybeMVKDescriptorSet.We'd have to require that any improvements and Vulkan extensions and that don't explicitly need Metal4 tech be added to the parent subclasses.
And maybe its a
MVKConfigurationoption to force not to use Metal4, say for an app that needs tessellation shader support. -
As a second choice (and its a much lower preference for me), we could possibly spawn a
Metal4branch, on which the underlying Metal objects are replaced with Metal4 objects, but would still be generally synchronizable with the main branch.We'd have to require that any improvements and Vulkan extensions and that don't explicitly need Metal4 support be added to the main branch, constantly sync
mainintoMetal4, and make the Metal4 branch only about Metal objects.And as @cdavis5e points out, to support tessellation shaders, we'd need to figure out mesh shaders as part of it, and build mesh shaders from tessellation shaders. Or maybe that's part of legacy management choice between the two branches that an app dev makes.
And at some point (years) down the road, when we don't care about legacy at all, we could swap the branches, and rename
mainto legacy, andMetal4tomain.
Adding on to the thread about metal 3: I think it's perfectly reasonable to just support 3 & 4 (and abandon ancient versions), especially given that metal 3 is supported on devices going back all the way to 2017 (such as my imac18,3).
Though abadoning metal 3 seems like an unwise choice, especially if it forces devs to have to use an outdated version of moltenVK that supports metal 3, thereby incurring a performance loss.
@billhollings
I'm wondering if the new flexible pipeline state feature would allow us to get an approximation of VK_EXT_graphics_pipeline_library or even VK_EXT_shader_object? That would be really helpful.
I think the more interesting way to do those would be to compile MTLFunction when compiling the individual pipeline libraries or shader objects.
Then for the final linked pipeline you have a VS shader and a FS shader that each do nothing except call the dynamically linked functions. That way the final compilation step should still be really fast and you'd decouple vertex shaders from fragment shaders (which is is usually more of a problem than additional pipeline state surrounding it)
For shader objects, you'd have to manually compile a statically linked version on a background thread, with VK_EXT_gpl thats the job of the application.
And non-AS Macs are now 5 years old. So I don't think hardware is an issue.
This years OS update is also the last one for x86 Macs.
Good time to remove the fallback to managed resources. I just hope Apple updates the iOS simulator to no longer be stupid like that for absolutely no reason.
My first choice would be to keep one main branch, and create *Metal4 subclasses of key objects such as MVKQueue, MVKGraphicsPipeline, MVKCommandEncoder, and maybe MVKDescriptorSet.
Isn't that like 60% of MoltenVK? The argument buffer implementation of descriptor sets in MVKDescriptorSet is actually the thing that could be shared besides MVKImage and MVKBuffer.
Adding on to the thread about metal 3: I think it's perfectly reasonable to just support 3 & 4 (and abandon ancient versions), especially given that metal 3 is supported on devices going back all the way to 2017 (such as my imac18,3).
Though abadoning metal 3 seems like an unwise choice, especially if it forces devs to have to use an outdated version of moltenVK that supports metal 3, thereby incurring a performance loss.
Agreed. My preference above is for option (1), subclassing, which would allow us to keep support for Metal 3.
Even with my option (2), continuing to focus any non-Metal4 code in main branch, would allow Metal3 support to continue to evolve.
The concept of descriptor sets now exists with MTL4ArgumentTable, which avoids having to set the bindings every time you encode a command buffer
I think MTL4ArgumentTable this is the Argument Buffer for Metal 2.0
I think MTL4ArgumentTable this is the Argument Buffer for Metal 2.0
No, it's the new way to store the slot bindings and apply multiple at once. Argument Buffers are still used for bindless. I think for MoltenVK the way forward would be to turn descriptor sets into Argument Buffers and bind them to an Argument Table.
The annoying thing is that apparently they got rid of push constants.
I think for MoltenVK the way forward would be to turn descriptor sets into Argument Buffers and bind them to an Argument Table.
I would agree that this should be the path forward. Since in Vulkan the descriptor set bindings are tied to the command buffer, we'd need to keep around an argument table per pipeline per command buffer, and then map the descriptor set bindings to that argument table. I think this is very simple, and I guess with some caching mechanism for each pipeline layout that could be improved upon again, so that memory isn't allocated constantly. I could see this being a lot more efficient than the current approach.
The annoying thing is that apparently they got rid of push constants.
Not really, since it was just a buffer that was managed by the driver itself, and just a constant T* in the shader. That can be easily emulated by effectively doing the same the Metal driver did before, and keeping around small buffers that are always bound to the last (I think that's what we used?) entry in the table.
I genuinely want to work on a branch that reworked the entire command queue/encoding stuff to use MTL4 stuff by default, and fallback to the rest, though I simply just don't have time to work on this in any capacity, except for maybe testing out other people's ideas or PRs.
My first choice would be to keep one main branch, and create *Metal4 subclasses of key objects such as MVKQueue, MVKGraphicsPipeline, MVKCommandEncoder, and maybe MVKDescriptorSet.
As I said in my first comment on this issue, this feels a little backwards to me. I doubt that Apple will rework the entire queue, command buffer and command encoding interface any time soon, and given that the old system will probably rarely be used going forward, I think we should use the Metal 4 implementations by default assuming they're available. The naming is fine, I guess, however the old functionality should probably be implemented as specializations behind the Metal 4 stuff as a fallback imho.
Does all this means that MoltenVK will implement VK_EXT_Descriptor_Buffer and VK_EXT_Shade_Object anytime soon?
Does all this means that MoltenVK will implement VK_EXT_Descriptor_Buffer
The problem with VK_EXT_descriptor_buffer was never the descriptor API, as discussed in #1776. The problem was figuring out residency, however I think a full residency set based approach would address that fairly well. However, I will mention that MTL4ArgumentTable does make implementing descriptor buffer a lot easier since it allows binding buffers by their address instead of their handle. Therefore the address <-> buffer handle book keeping can be completely removed. If I remember correctly, this was also a problem with the implementation for VK_KHR_acceleration_structures.
I guess this would be another big reason to have a Metal 4 centric implementation, since it makes VK_EXT_descriptor_buffer a lot easier to implement, and probably allows other descriptor extensions to be implemented more easily or at all. Again I want to strongly advocate for a Metal 4 rework of the command encoding, which will also force us to only rely on residency sets which should be a lot more optimal as Apple have said themselves.
I feel like the current design philosophy of keeping backwards compatibility as much as possible is tying our feet more than it actually has any gains. @billhollings I would be very interested in some kind of statistics that show what OS versions and what GPUs are most commonly used together with MoltenVK. It's been 5 years since M1, and even longer for the A-series, so I would guess that from a market share perspective these carry probably around 80% of users? There's a point where keeping backwards compatibility is just wasteful. stares at Windows.
Given MoltenVK is being rewritten right now anyway, this seems like a sensible split for "classic" moltenvk (this repo) to be metal 3 and keep support for older hw and for the new implementation to support only Metal 4.
It's being rewritten?
aren't the mesa guys wanting to rewrite it?
There's more. Metal 4 doesn't support D3D11-style tessellation. You have to use mesh shaders. We need to finish up the implementation of vertex-pipeline shaders on top of Mesh shaders to be able to use Metal 4.
Finish up?
Has it been started somewhere? π
CodeWeavers has their implementation of geometry shaders, which is languishing (see KhronosGroup/SPIRV-Cross#2200). And, I have an incomplete implementation of vertex shaders on top of mesh shaders, which still needs testing and fleshing out.
@spnda
I doubt that Apple will rework the entire queue, command buffer and command encoding interface any time soon, and given that the old system will probably rarely be used going forward, I think we should use the Metal 4 implementations by default assuming they're available.
I'm not sure I understand what you're saying, or what this has to do with my suggestion of a design that will use subclassing of key MoltenVK objects. Can you elaborate on your concerns? The "assuming they're available" part is what I'm trying to address with a subclass design.
The problem with VK_EXT_descriptor_buffer was never the descriptor API, as discussed in https://github.com/KhronosGroup/MoltenVK/issues/1776. The problem was figuring out residency, however I think a full residency set based approach would address that fairly well.
Full residency exists in MoltenVK already. It was added a little while ago.
It's been 5 years since M1, and even longer for the A-series, so I would guess that from a market share perspective these carry probably around 80% of users? There's a point where keeping backwards compatibility is just wasteful. stares at Windows.
As I said above, I don't believe legacy is about hardware, just OS versions. MoltenVK is used by game engines, emulators, a lot of games, and significant non-game software like Autodesk. All those devs will probably not be in a situation to tell their users to "upgrade to *OS 26, or die", for at least the next several years. π
@Kobin
It's being rewritten?
@vaylor27
aren't the mesa guys wanting to rewrite it?
What @alyssarosenzweig said is misleading. She means that Mesa is developing an Apple-Silicon-only Metal back end. The Mesa Metal back-end is under development, and is expected sometime at the end of this year. It is not connected with MoltenVK at all, and will be a second Vulkan-on-Metal option. Her assumption is that it will make MoltenVK obsolete, except for Intel hardware (and maybe Metal3).
Just sharing my 2 cents..
As a user i would be leaning towards a dxvk approach. A MoltenVK 1.x branch and a MoltenVK 2.x . Ideally stuff implemented in one could be transferred to the other.
As a developer, someone who has dealt with similar stuff before, that idea is very nice but prone to bugs and will put more work on a thin team.
My question therefore is: would that be an acceptable approach for MoltenVK clients? "Closing" MoltenVK as is and moving towards a new version? Can that version be implemented in a time acceptable by the clients?
EDIT: @billhollings Not exactly related to Metal 4 but i see a lot of open Issues marked as "Completed" that have the person that opened them saying its fixed or with no reply for over 2 years. Can't those be closed?
Ignoring any potential Mesa layer, as far as MoltenVK is concerned, I agree with @rcaridade145 that there should be a split between the current 1.x MoltenVK based on Metal 3, and an entirely new branch 2.x based on Metal 4. The potential amount of overhead savings and cleaner code that could be gained with a Metal 4 only approach cannot be ignored, but attempting to blend it with Metal 3 will only add technical debt for the future.