bevy
bevy copied to clipboard
Meshlet single pass depth downsampling (SPD)
Objective
- Using multiple raster passes to generate the depth pyramid is extremely slow
- Pulling data from the source image is the largest bottleneck, it's important to sample in a cache-aware pattern
- Barriers and pipeline drain between the raster passes is the second largest bottleneck
- Each separate RenderPass on the CPU is really expensive
Solution
- Port FidelityFX SPD to WGSL, replacing meshlet's existing multiple raster passes with a ~~single~~ two compute dispatches. Lack of coherent buffers means we have to do the the last 64x64 tile from mip 7+ in a separate dispatch to ensure the mip 6 writes were flushed :(
- Workgroup shared memory version only at the moment, as the subgroup operation is blocked by our upgrade to wgpu 0.20 #13186
- Don't enforce a power-of-2 depth pyramid texture size, simply scaling by 0.5 is fine
Which parts do you want me to review? Presumably downsample_depth.wgsl
? Anything else?
The downsample shader, yeah. I can't post a link ATM, but you can find the full PR diff by changing the GitHub diff to compare against my meshlet-previous-frame-depth-pyramid branch.
Downscale is broken, will need to debug. Might be the barrier issue.
I believe that there might be an improvement that can be made here:
fn reduce_load_mip_0(tex: vec2u) -> f32 {
let uv = (vec2f(tex) + 0.5) / vec2f(textureDimensions(mip_0));
return reduce_4(textureGather(mip_0, samplr, uv));
}
From what I understand from the spec, textureGather already computes the four component minimum and stores it in the w
channel. So there is no need to reduce, just do the following:
fn reduce_load_mip_0(tex: vec2u) -> f32 {
let uv = (vec2f(tex) + 0.5) / vec2f(textureDimensions(mip_0));
return textureGather(mip_0, samplr, uv).w;
}
It is very possible that the compiler spots this optimization already.
Frankly I'm not sure how textureGather
works exactly, in Vulkan you needed a VK_SAMPLER_REDUCTION_MODE_MIN
sampler and corresponding extension to do this.
textureGather does not return the minimum of the 4 values. The w component is the (u_min, v_min) value of the sample footprint. I.e. given 4 texels (the sample footprint) arranged in a 2x2 quad, the w component is the value at location (u_min, v_min).
Yes, ideally I would be able to use VK_SAMPLER_REDUCTION_MODE_MIN, but unfortunately wgpu does not support it.