wgpu Pipeline cache API and implementation for Vulkan

Connections

Fixes #5293 (for Vulkan)
Required for https://github.com/linebender/vello/issues/289
See also https://github.com/linebender/vello/pull/459 which validated that this was required

Description Adds a PipelineCache resource to wgpu's API, and wires it down to Vulkan vkCreatePipelineCache. This cache is added as a parameter to ComputePipelineDescriptor and RenderPipelineDescriptor. This parameter is used as expected on Vulkan, and ignored on DirectX/Metal/WebGPU/gles.

Testing Testing has been performed on a Pixel 6, which has validated that this works as expected between runs. This is in the branch in my fork of Vello. This does only test the cache for Vulkan pipelining.

Checklist

[x] Run cargo fmt.
[x] Run cargo clippy. If applicable, add:
- [x] --target wasm32-unknown-unknown
- [X] --target wasm32-unknown-emscripten - gets the same errors on main, so I assume it's fine
[X] Run cargo xtask test to run tests - I get the same failures on main, so I assume it's fine
[x] Add change to CHANGELOG.md. See simple instructions inside file.

Status This PR is now fully ready for review

Future Work I am planning on implementing additional validation inside wgpu that the cache is suitable, as discussed in https://medium.com/@zeuxcg/creating-a-robust-pipeline-cache-with-vulkan-961d09416cda. That is mainly about storing the driver version and file size inline in the file. This should be done before this feature is released, but could be in a different pull request. Please advise if you have any preference here.

This could also be extended to different platforms/APIs - see e.g. https://github.com/gfx-rs/gfx/issues/2877 I am not intending to do this work myself

Feb 28 '24 17:02 DJMcNab

cc @expenses, as you had a previous investigation in https://github.com/gfx-rs/gfx/issues/3716

Mar 01 '24 16:03 DJMcNab

Thanks for the review @cwfitzgerald

In terms of validation, my intent was to follow up with that, because I wasn't sure I was doing the right thing. So I can definitely add that in here. So I can definitely add that validation in this PR, since the rest of the shape is approximately right. One thing I need guidance on is how to hash the data in wgpu_core. Are you aware of a preferred library for this? It feels like the sort of thing which is quite opinionated

I just want some clarification about what you mean by:

The user will pull the pipeline key from the adapter, hash it, and use it as a filename to store the data.

In my prototype integration, I wasn't hashing the filename. I find it unlikely that users would do so, either. Should we pre-hash the value from the cache key method?

Mar 14 '24 06:03 DJMcNab

It shouldn't do any additional validation

This is also slightly troubling, as different APIs have different validation needs. For example, Vulkan probably needs to validate the driver version is the same, whereas the concept of a driver version isn't exposed at the higher levels

Mar 14 '24 07:03 DJMcNab

For example, Vulkan probably needs to validate the driver version is the same, whereas the concept of a driver version isn't exposed at the higher levels

That's interesting... are there Vk drivers in the wild that accept incompatible pipeline caches from applications, or is this more preemptive (due to the quality issues with some drivers)?

Mar 14 '24 13:03 xorgy

According to the linked article, there have been cases in the past where a driver either:

Doesn't validate the UUID which is meant to detect this at all
Fails to update the UUID when it would be incompatible

Mar 14 '24 13:03 DJMcNab

This is also slightly troubling, as different APIs have different validation needs. For example, Vulkan probably needs to validate the driver version is the same, whereas the concept of a driver version isn't exposed at the higher levels

The idea is that the wgpu-hal decides on its pipeline key based on its validation needs, and wgpu-core is the one who does the comparison and rejection of the pipeline key. This way the driver version will be part of the wgpu-hal pipeline key.

In my prototype integration, I wasn't hashing the filename. I find it unlikely that users would do so, either. Should we pre-hash the value from the cache key method?

If we make cache key an opaque type we can put a into_bytes and hash method on it.

Mar 14 '24 17:03 cwfitzgerald

The key factor which needs to be accounted for is:

If the driver updates, the old cache you still have is useless.

I'm planning on implementing that with effectively two keys, a "device" key (which identifies the device/backend combination) and a further validation key. The device key should related to the "file name" being used, whereas the validation key is just used to bail from using the cache. That is potentially what you are describing, but I think it's good to clarify that

Mar 14 '24 17:03 DJMcNab

This is now ready for follow-up review. I am going to now test this updated version in Vello, and also need some guidance on computing the hash (it currently uses a hardcoded value).

I'm unlikely to make further functional changes before another review

Mar 18 '24 10:03 DJMcNab

Having done some more thinking on this, I have decided to exclude the hashing of the cache data from this PR. The main reasons for this are:

I believe it to be unlikely that filesystems would corrupt the data in a way which keeps the same length, but changes some of the content
Any attempt at meddling in the file by other processes/application is already declared out of scope by our safety comment
We do not already have a dependency which would compute a suitable hash. We could likely produce a hash designed for a hashmap, but these have significantly different requirements to our hashing need.

I also don't see the necessity in making the pipeline cache key an opaque set of bytes, rather than a readable string. These files should be in a location which would be rarely accessed by end-users. Additionally, if using these as a file name, an application would have to sanitise the bytes to be reasonable for use as a file name. I suspect at least some applications would not do so, and potentially lose some of portability which wgpu provides. But I also think it's plausible that I've missed some important reason for this to be an incorrect assessment.

Apr 05 '24 14:04 DJMcNab

Re-requesting review pending CI

Apr 19 '24 17:04 DJMcNab

Looks to be an issue with the ash update

May 16 '24 12:05 cwfitzgerald

I'll fix this up and get it in

May 16 '24 13:05 cwfitzgerald