Objective

Using multiple raster passes to generate the depth pyramid is extremely slow
Pulling data from the source image is the largest bottleneck, it's important to sample in a cache-aware pattern
Barriers and pipeline drain between the raster passes is the second largest bottleneck
Each separate RenderPass on the CPU is really expensive

Solution

Port FidelityFX SPD to WGSL, replacing meshlet's existing multiple raster passes with a ~~single~~ two compute dispatches. Lack of coherent buffers means we have to do the the last 64x64 tile from mip 7+ in a separate dispatch to ensure the mip 6 writes were flushed :(
Workgroup shared memory version only at the moment, as the subgroup operation is blocked by our upgrade to wgpu 0.20 #13186
Don't enforce a power-of-2 depth pyramid texture size, simply scaling by 0.5 is fine

Apr 17 '24 01:04 JMS55

Which parts do you want me to review? Presumably downsample_depth.wgsl? Anything else?

Apr 17 '24 19:04 pcwalton

The downsample shader, yeah. I can't post a link ATM, but you can find the full PR diff by changing the GitHub diff to compare against my meshlet-previous-frame-depth-pyramid branch.

Apr 17 '24 20:04 JMS55

Downscale is broken, will need to debug. Might be the barrier issue.

Apr 24 '24 01:04 JMS55

I believe that there might be an improvement that can be made here:

fn reduce_load_mip_0(tex: vec2u) -> f32 {
    let uv = (vec2f(tex) + 0.5) / vec2f(textureDimensions(mip_0));
    return reduce_4(textureGather(mip_0, samplr, uv));
}

From what I understand from the spec, textureGather already computes the four component minimum and stores it in the w channel. So there is no need to reduce, just do the following:

fn reduce_load_mip_0(tex: vec2u) -> f32 {
    let uv = (vec2f(tex) + 0.5) / vec2f(textureDimensions(mip_0));
    return textureGather(mip_0, samplr, uv).w;
}

It is very possible that the compiler spots this optimization already.

Frankly I'm not sure how textureGather works exactly, in Vulkan you needed a VK_SAMPLER_REDUCTION_MODE_MIN sampler and corresponding extension to do this.

Jun 26 '24 21:06 otoomey

textureGather does not return the minimum of the 4 values. The w component is the (u_min, v_min) value of the sample footprint. I.e. given 4 texels (the sample footprint) arranged in a 2x2 quad, the w component is the value at location (u_min, v_min).

Yes, ideally I would be able to use VK_SAMPLER_REDUCTION_MODE_MIN, but unfortunately wgpu does not support it.

Jun 27 '24 01:06 JMS55

bevy
bevy copied to clipboard

Meshlet single pass depth downsampling (SPD)

Objective

Solution

bevy bevy copied to clipboard

Meshlet single pass depth downsampling (SPD)

Objective

Solution

bevy
bevy copied to clipboard