wgpu Pipeline Caching (between program executions)

Is your feature request related to a problem? Please describe. In Vello on Android, the time taken to start the app up is unacceptably long (~2 seconds after linebender/vello#455). This is an order of magnitude longer than users would expect app startup to take. The vast majority of this startup time is spent in calls to wgpu::Device::create_compute_pipeline. This is because each shader is being compiled to device microcode from scratch, each run.

Describe the solution you'd like We would like for wgpu to provide an unsafe API for VkPipelineCache objects, to allow reusing device microcode compilation between executions.

My proposed API would be for the application to provide a path to a directory for this cache to be stored in/retrieved from. When creating a pipeline cache object, I would expect wgpu to attempt to read from the file wgpu_vulkan.cache (or alternative name) when initialised with a Vulkan backend, then create a pipeline cache from this value. This would also perform sanity checks on device version, probably by including an additional custom header (as discussed in Creating a Robust Pipeline Cache with Vulkan)

A method would then also be added to the cache to write the data from the cache back to disk.

Describe alternatives you've considered Variations on the proposed solution:

The API could instead accept and output the data as an opaque blob. This would however run into complications if this were extended to include multiple backends, as the data for each backend would need to be loaded from disk, even when only one backend was being used.
The API could store the Vulkan specific data format - that is, saving the data which would be passed to PipelineCacheCreateInfoBuilder::initial_data directly in the file. This would leave it up to applications to implement any sanity checking, beyond those provided by the drivers.
Combining 1 and 2, i.e. accepting a &[u8] which will be passed as-is to PipelineCacheCreateInfoBuilder::initial_data.
The APIs could be fn(impl FnOnce(Backend)->Result<Vec<u8>, E>)->Result<Vec<u8>, E> or impl FnOnce(Backend, Vec<u8>) -> R. This would alleviate most concerns, and is probably the right approach.

Alternative solutions:

wgpu could automatically implement this pipeline caching, without requiring manual implementation from user apps. I believe this is likely to be untenable for several reasons:
1. wgpu cannot know where the cache data should be stored, without implementing suspect heuristics based on e.g. executable name
2. wgpu cannot trust the cache data, as it could be modified/corrupted by the end user of the application or other programs. This means wgpu would have unavoidable unsoundness[^1]
3. wgpu cannot know when to save this cache data, as the usage of wgpu is dependent on the application. E.g. some applications may initialise additional shaders later into their program execution
wgpu could allow us to pass an ash::vk::PipelineCache to wgpu::Device::create_compute_pipeline - either in the ComputePipelineDescriptor or through a new method. I suspect this is untenable, as it would specialise.
wgpu could allow us to pass an ash::vk::PipelineCache to wgpu::hal::vulkan::Device::create_compute_pipeline, and allow creating a wgpu::ComputePipeline from a wgpu::hal::vulkan::ComputePipeline. I don't know why this second aspect is currently not permitted.

Additional context I have not researched other backends' caching APIs.

I have implemented an experiment to justify the requirement to. The code of that experiment can be found in linebender/vello#459 which depends on #5292. On my device (a Google Pixel 6), this reduces pipeline creation time (on non-first boots) from ~2s to ~30ms, and empirically makes the test app launch as quickly as I'd expect an app to launch.

I suggest that the full API for the pipeline cache could look something like:

struct PipelineCache; // Probably not a unit struct in the final iteration
impl Clone for PipelineCache {}

impl Device {
    unsafe fn create_pipeline_cache(&self, folder: PathBuf) -> PipelineCache;
}
struct ComputePipelineDescriptor {
     cache: PipelineCache
}
impl PipelineCache {
     fn write_to_disk(&self) -> Result;
}

Future possibilities could allow using APIs such as vkMergePipelineCaches.

I am willing to implement this in wgpu, but need guidance around how adding new resources should look, as well as the expected API

[^1]: This unsoundness is also unavoidable on the part of any programs using this feature of wgpu, but that's a tradeoff some users (including Vello) are able to justify.

Feb 23 '24 11:02 DJMcNab

Note that pipeline caches have the potential of having an even greater impact for those that are using the d3d12 backend and are stuck with FXC for one reason or another.

I don't think that a pipeline caching API should involve the file system. For an application that wants to tighten its process sandboxing, having low level graphics middleware assume it has file system access is a bit of a nightmare. And more generally the app author should be in control of how the cache is stored.

Instead it should take/produce a binary blob (which is what the file system version would have to work with under the hood anyway).

The API could instead accept and output the data as an opaque blob. This would however run into complications if this were extended to include multiple backends, as the data for each backend would need to be loaded from disk, even when only one backend was being used.

If an application author is not consistently picking the same adapter and device then caching won't work regardless of the backend. So loading a cache built from the wrong backend is equivalent to loading a cache built from the wrong device or driver version. If for some reason an app author wants to use multiple devices/backends then they'll have to manage multiple blobs (or use a single blob that will work with some backends but won't provide speedups with others).

Feb 23 '24 15:02 nical

Interesting note on d3d12, although I'm not likely to implement the support on that backend myself.

So perhaps the correct key for the cache selection isn't only Backend, but also including device vendor and device id (at least for the Vulkan backend). I don't see the point in making managing these blobs hard, when we can provide an API which encourages good behaviour. In most cases, it will be only for the same pipeline, but I don't see why we shouldn't make a pit of success.

As I mentioned on the matrix, I agree that having it be assuming a filesystem isn't good, which is why I mentioned variation 4 (which I added slightly later, hence not being the primary option).

So to summarise, my proposed api would be something like:

pub enum PipelineCacheKey {
     Vulkan{ device_id: u32, vendor_id: u32 },
     // ...
}
impl Display for PipelineCacheKey {
    fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result {
          match self {
               PipelineCacheKey::Vulkan { device_id, vendor_id } => write!(fmt, "vulkan_{vendor_id}_{device_id}")
          }
     }
}
impl Device {
     pub unsafe fn create_pipeline_cache<E>(&self, data: impl FnOnce(PipelineCacheKey)->Result<Option<Vec<u8>>, E>) -> Result<PipelineCache, PipelineCacheCreationError<E>>;
}
impl PipelineCache {
     pub fn get_data(&self, device: &Device) -> (Vec<u8>, PipelineCacheKey);
}

Feb 23 '24 15:02 DJMcNab