Vulkan timestamp queries can return 0 if resolved too soon

Open Dinnerbone opened this issue 1 year ago • 1 comments

Description Whilst debugging https://github.com/Wumpf/wgpu-profiler/issues/84 I narrowed it down to a fairly small piece of code which returns 0 on Vulkan and a valid timestamp on DX12/GL. I've copied a minimal repro below.

I haven't debugged further because I don't have a grasp of Vulkan synchronization, but I suspect we aren't waiting for the timestamp query to be resolved before reading from it. My understanding is that wgpu should be handling this behind the scenes with barriers transparently - at least, I don't see anything in the docs to suggest otherwise.

If we move the copy_buffer_to_buffer to another command encoder that gets submitted immediately after, it seems to work fine.

Repro steps

let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                required_features: wgpu::Features::TIMESTAMP_QUERY
                    | wgpu::Features::TIMESTAMP_QUERY_INSIDE_ENCODERS,
                ..Default::default()
            },
            None,
        )
        .await
        .expect("Failed to create device");

    let query_set = device.create_query_set(&wgpu::QuerySetDescriptor {
        label: Some("query_set"),
        ty: wgpu::QueryType::Timestamp,
        count: 1,
    });

    let resolve_buffer = device.create_buffer(&wgpu::BufferDescriptor {
        label: Some("resolve_buffer"),
        size: wgpu::QUERY_SIZE as _,
        usage: wgpu::BufferUsages::QUERY_RESOLVE | wgpu::BufferUsages::COPY_SRC,
        mapped_at_creation: false,
    });

    let map_buffer = device.create_buffer(&wgpu::BufferDescriptor {
        label: Some("map_buffer"),
        size: wgpu::QUERY_SIZE as _,
        usage: wgpu::BufferUsages::MAP_READ | wgpu::BufferUsages::COPY_DST,
        mapped_at_creation: false,
    });

    // ---- problem code
    let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("encoder"),
    });
    encoder.write_timestamp(&query_set, 0);
    encoder.resolve_query_set(&query_set, 0..1, &resolve_buffer, 0);
    encoder.copy_buffer_to_buffer(&resolve_buffer, 0, &map_buffer, 0, wgpu::QUERY_SIZE as _);
    queue.submit([encoder.finish()]);
    // ----

    map_buffer.slice(..).map_async(wgpu::MapMode::Read, |_| ());
    device.poll(wgpu::Maintain::Wait);
    let view = map_buffer.slice(..).get_mapped_range();
    let timestamp = u64::from_le_bytes((*view).try_into().unwrap());
    dbg!(timestamp); // 0 on Vulkan, a valid timestamp on DX12

If we change the encoder to something like this, it works fine:

    let mut encoder1 = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("encoder 1"),
    });
    encoder1.write_timestamp(&query_set, 0);
    encoder1.resolve_query_set(&query_set, 0..1, &resolve_buffer, 0);
    let mut encoder2 = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("encoder 2"),
    });
    encoder2.copy_buffer_to_buffer(&resolve_buffer, 0, &map_buffer, 0, wgpu::QUERY_SIZE as _);
    queue.submit([encoder1.finish(), encoder2.finish()]);

Expected vs observed behavior I'd expect wgpu to perform the synchronization behind the scenes and return a correct value, or if this is user error, for the docs to say what manual synchronization needs to happen here.

Extra materials None

Platform Windows 10 AMD Radeon RX 6900 XT wgpu 22.1.0 and also 2b15a2b24b69e105ebdbb5e8a859e2df75e441c1 (latest commit at time of writing)

Oct 13 '24 14:10 Dinnerbone

Just noting here the vulkan spec says this is a copy_dest operation.

Oct 14 '24 00:10 cwfitzgerald