MoltenVK WIP: Add: KHR

This adds support for KHR_maintenance4:

Allow the application to destroy their VkPipelineLayout object immediately after it was used to create another object. It is no longer necessary to keep its handle valid while the created object is in use.

this was already possible with the current code, as the layout defining data gets copied into MVKPipeline when creating.
Add a new maxBufferSize implementation-defined limit for the maximum size VkBuffer that can be created.
Add support for the SPIR-V 1.2 LocalSizeId execution mode, which can be used as an alternative to LocalSize to specify the local workgroup size with specialization constants.

SPIRV-Cross already handles this.
Add a guarantee that images created with identical creation parameters will always have the same alignment requirements.
Add new vkGetDeviceBufferMemoryRequirementsKHR, vkGetDeviceImageMemoryRequirementsKHR, and vkGetDeviceImageSparseMemoryRequirementsKHR to allow the application to query the image memory requirements without having to create an image object and query it.

This took me a bit to implement, as I initially wanted separate code for evaluating the memory requirements. However, this would duplicate a lot of code and it's hard to directly unify it because of static contexts, so I decided to go with this simple approach of just creating an implicit MVKBuffer or MVKImage object. Not sure I like this that much but it does pass CTS. I want to hear what others have to say how this could be done in the MVK codebase.
Relax the requirement that push constants must be initialized before they are dynamically accessed.

I think this is fine with Metal, as the buffers in the argument table are initially nullptr afaik. So its valid to not set the buffers if they're not used (which is what my interpretation of this sentence is).
Relax the interface matching rules to allow a larger output vector to match with a smaller input vector, with additional values being discarded.

SPIRV-Cross already handles this case as far as I can tell. I used an example Vulkan application where the vertex outputs a vec4 and the fragment accepts a vec2 or vec3, and it is handled correctly in the conversion process. It correctly replaces the float2/float3 with a float4 and then uses vector swizzling to discard the unused values.
Add a guarantee for buffer memory requirement that the size memory requirement is never greater than the result of aligning create size with the alignment memory requirement.

Unrelated, but I did some maintenance in the MVKImage and MVKDevice files in code related to this change. Mainly around the getExtent3D, getBytesPerRow, and getBytesPerLayer functions as they were repeating function calls multiple times. getBytesPerLayer would call getExtent3D and getBytesPerRow, which also calls getExtent3D. Additionally, the caller of getBytesPerLayer would also call getExtent3D. My changes drastically reduce the amount of unnecessary function calls made in that area.

Jan 06 '24 11:01 spnda

Not sure I like this that much

I don't like it either, considering that the whole point of the new memory requirements functions is that you don't have to create a resource just to figure how much memory it needs. I think it should be possible to refactor MVKBuffer::getMemoryRequirements() and MVKImage::getMemoryRequirements() to be methods on the MVKDevice object; then you could have MVKBuffer and MVKImage call back to MVKDevice, passing the appropriate information. This maps better to Metal anyway: the corresponding methods in Metal are MTLDevice methods.

Jan 10 '24 07:01 cdavis5e

Not sure I like this that much

I don't like it either, considering that the whole point of the new memory requirements functions is that you don't have to create a resource just to figure how much memory it needs. I think it should be possible to refactor MVKBuffer::getMemoryRequirements() and MVKImage::getMemoryRequirements() to be methods on the MVKDevice object; then you could have MVKBuffer and MVKImage call back to MVKDevice, passing the appropriate information. This maps better to Metal anyway: the corresponding methods in Metal are MTLDevice methods.

I originally had it setup this way but it required a lot of code to be duplicated (especially on images with computing usage, formats, atomics, mips, layers, ...), and I wasn't sure I had everything setup correctly. The only thing I came up with that I would like is having a separate MVKBufferInfo/MVKImageInfo struct which contains basic information used to compute the memory requirements which gets created in every MVKBuffer/MVKImage but also the memory requirement functions on MVKDevice. Those structs would then have the getMemoryRequirement functions.

I originally decided to just open the PR like this and then talk a bit about a possible better implementation. Though I will definitely look into shifting the functions into MVKDevice again.

Jan 10 '24 09:01 spnda

Full CTS testing of this PR caused at least one CTS crash, because vkGetDeviceImageMemoryRequirements() tried to access MVKImage::_memoryBindings out of bounds.

This exposed the fragility of how we were mapping plane indexes to memory bindings. PR #2125 fixes that fragility, and we should rebase this PR to after #2125, once it's merged (note that PR has been pushed to a temporary branch 1.2.8-temp until after the current SDK process is complete).

In particular, any references in this PR to _memoryBindings should use getMemoryBinding() instead. In particular, the new getMemoryRequirements(VkMemoryRequirements2...) should become:

VkResult MVKImage::getMemoryRequirements(VkMemoryRequirements2 *pMemoryRequirements, uint8_t planeIndex) {
	VkResult rslt = getMemoryRequirements(&pMemoryRequirements->memoryRequirements, planeIndex);
	if (rslt != VK_SUCCESS) { return rslt; }
	return getMemoryBinding(planeIndex)->getMemoryRequirements(nullptr, pMemoryRequirements);
}

Jan 11 '24 16:01 billhollings

Full CTS testing of this PR caused at least one CTS crash, because vkGetDeviceImageMemoryRequirements() tried to access MVKImage::_memoryBindings out of bounds.

This exposed the fragility of how we were mapping plane indexes to memory bindings. PR #2125 fixes that fragility, and we should rebase this PR to after #2125, once it's merged (note that PR has been pushed to a temporary branch 1.2.8-temp until after the current SDK process is complete).

In particular, any references in this PR to _memoryBindings should use getMemoryBinding() instead. In particular, the new getMemoryRequirements(VkMemoryRequirements2...) should become:
VkResult MVKImage::getMemoryRequirements(VkMemoryRequirements2 *pMemoryRequirements, uint8_t planeIndex) {
	VkResult rslt = getMemoryRequirements(&pMemoryRequirements->memoryRequirements, planeIndex);
	if (rslt != VK_SUCCESS) { return rslt; }
	return getMemoryBinding(planeIndex)->getMemoryRequirements(nullptr, pMemoryRequirements);
}

Interesting. I only ever tested some parts of the CTS test because I couldn't figure out if there is any txt file that contained all of the tests related to KHR_maintenance4 or any specific extension for that matter. For future reference, how do I figure out an exhaustive list of all CTS tests related to a feature, an extension, or a specific Vulkan core version (say Vulkan 1.0 conformance)?

Jan 11 '24 17:01 spnda

Interesting. I only ever tested some parts of the CTS test because I couldn't figure out if there is any txt file that contained all of the tests related to KHR_maintenance4 or any specific extension for that matter. For future reference, how do I figure out an exhaustive list of all CTS tests related to a feature, an extension, or a specific Vulkan core version (say Vulkan 1.0 conformance)?

It's complicated. Sometimes an extension will include a large number of CTS test that have part of the extension name embedded in the test name. In this case, AFAICT there are only 3 CTS tests that have maintenance4 in their name, but there are hundreds of other tests that have code like if(mtce4) embedded in them, to decide between two code paths that may or may not be particularly connected to KHR_maintenance4.

If the extension has decent CTS coverage, I don't always run full CTS tests on every extension addition, but since we were close to SDK release, I was doing that, in case we could squeeze this PR into the current SDK release.

BTW...that full CTS run also exposed a couple of hundred new test failures along the lines of:

Test case 'dEQP-VK.pipeline.monolithic.interface_matching.vector_length.out_vec4_in_vec3_member_of_block_vert_out_frag_in'..
  NotSupported (VK_KHR_maintenance4 is not supported at vktTestCase.cpp:824)

becomes:

Test case 'dEQP-VK.pipeline.monolithic.interface_matching.vector_length.out_vec4_in_vec3_member_of_block_vert_out_frag_in'..
[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Render pipeline compile failed (Error code 3):
Fragment input(s) `user(locn1)` mismatching vertex shader output type(s) or not written by vertex shader.
  Fail (retcode: VK_ERROR_INITIALIZATION_FAILED at vkPipelineConstructionUtil.cpp:186)

From the error text, it doesn't look like it's related to VK_KHR_maintenance4 specifically, but the extension seems to be exposing a code path in the test that is triggering a shader translation matching error.

In such a case, it might be good to pull the KHR_maintenance4 extension in once it's ready, and then address these in a separate PR, after.

Jan 11 '24 18:01 billhollings

Yes this should probably be a WIP PR. I didn't fully test it and the implementation of the memory requirements functions wasnt yet "finished". But thanks, I'll look into the CTS tests more and see where the issues are coming from.

Jan 11 '24 18:01 spnda

Yes this should probably be a WIP PR. I didn't fully test it and the implementation of the memory requirements functions wasnt yet "finished". But thanks, I'll look into the CTS tests more and see where the issues are coming from.

The important thing when adding any new extension is to try to avoid CTS regressions (ie. Pass -> Fail).

For Not Supported -> Fail, it depends on whether the now-failing functionality is caused by the new extension, or it's just exposing more examples of existing failures occurring on other tests (ie- dups of the existing 11K test failures). If the former, then it should be fixed in the extension PR. If the latter, then it depends on the priority relative to the extension (is the extension still useful). Otherwise, we risk it being asymptotically hard to add a useful extension because some existing (unrelated & non-critical) bug is still waiting to be addressed.

In this case, the mismatching vertex shader output type(s) or not written by vertex shader error does not seem to appear without KHR_maintenance4, so it should probably be addressed here.

Jan 11 '24 19:01 billhollings

I just want to put some of this information here for tracking. I've tried to find all CTS tests related to the extension, and ran them. There's a few things that fail with the currently pushed code:

Images with transient usage (so lazily allocated memory) seem to have different memory types depending on the image usage flags. (one has INPUT_ATTACHMENT_BIT, while the other has COLOR_ATTACHMENT_BIT)
SPIRV-Cross fails to generate valid MSL for some of the vector conversions, which seem fixable, but I think they should be fixed before merging this as it's directly related to the extension. These seem to occur for all tessellation tests, where the vector length differs, such as dEQP-VK.pipeline.monolithic.interface_matching.vector_length.out_vec4_in_vec3_loose_variable_vert_out_tesc_in_tese_frag.

[mvk-error] VK_ERROR_INITIALIZATION_FAILED: Shader library compile failed (Error code 3):
program_source:22:38: error: member reference base type 'const device unsigned int' is not a structure or union
    float _34 = float(gl_in[0].m_39.x.x == 4u) * float(gl_in[0].m_39.y.x == 16u);
                      ~~~~~~~~~~~~~~~^~
program_source:22:71: error: member reference base type 'const device unsigned int' is not a structure or union
    float _34 = float(gl_in[0].m_39.x.x == 4u) * float(gl_in[0].m_39.y.x == 16u);
                                                       ~~~~~~~~~~~~~~~^~

Related to the same tests, SPIRV-Cross does handle some vector conversions correctly, but obviously needs to cover more cases as required by this extension. The cases I tested with direct values (so layout(location = 0) vec2 xy all pass, but as soon as there's a type of a different vector length in an array or a structure they fail with the error you posted already.

Firstly, I will refactor the memory requirement code and then look at SPIRV-Cross. I currently have a concept which adds a MVKBufferInfo and MVKImageInfo which both inherit from a MVKResourceInfo. These classes are then also used by their respective resource classes for calculating the memory requirements, as well as be created by an MVKDevice for the static memory requirement functions. I initially tried to tackle this with virtual inheritance but that quickly became a shitshow, so I'm just placing the class objects as members into the resource classes.

Jan 17 '24 22:01 spnda

Images with transient usage (so lazily allocated memory) seem to have different memory types depending on the image usage flags. (one has INPUT_ATTACHMENT_BIT, while the other has COLOR_ATTACHMENT_BIT)

To fix this properly, you need to implement input attachments as shader framebuffer fetch. SPIRV-Cross is already done; you just need to wire it up in MoltenVK. The hard part is mapping Vulkan input attachments to Metal render target indices.

Jan 17 '24 23:01 cdavis5e

Any updates on this?

Jul 08 '24 09:07 corporateshark

WIP: Add: KHR_maintenance4