wgpu Support debug info such as api call count or resource counter

Is your feature request related to a problem? Please describe.

When developing a complex graphics project, it's essential to maintain some easy-to-access debug info for your graphics API layer. For example, a resettable counter that records the API call count(such as drawcall or any other API calls) to debug performance issues like unnecessary work that should be optimized, a living resource(buffer, texture) instance counter to debug if any resource is leaking, a memory counter to identify abnormal consumption issue.

The graphics debugger has all the information we need, but the easy-to-access property is important because the application is interactive(the debug information can be shown on UI interactively). Some problems are hard to expose and explore outside in an offline style.

Currently, the common solution to this is wrapping the wgpu object and doing it at the encapsulation layer. This is tedious, and I assume every serious project that uses wgpu will do this kind of encapsulation, so I think this should be a feature provided by wgpu directly.

Describe the solution you'd like

The newly introduced extensibility support in the wgpu layer(such as DispatchDevice) seems to be a good place to inject these implementations. The API could be a util function that converts a Device into (Device, DebugInfoPort), the returned device is instrumented, and the user can access debug info through DebugInfoPort.

May 22 '25 06:05 mikialex

wgpu::Device already had get_internal_counters. It's still a bit basic and could be extended to include more, but I think it addresses that need. Or did you have something else in mind?

May 22 '25 08:05 Wumpf

If implemented completely, the current internal counter can address part of my request. However, I still have two thoughts about it:

The internal counter seems mainly to count the hal object that the wgpu maintains, so its purpose is mainly to debug the wgpu-core related issue because the hal object may have different lifetime than wgpu objects. This is useful for downstream wgpu users as well, but what I really want is the wgpu API level's instance counting so that I can directly debug the misuse of the wgpu level API.
The wgpu level instance counting can be added in the currently internal counter. But I really think the DispatchDevice is the better place to be implemented.
- It directly hooks the API call and does simple counting, easy to understand, and clean to maintain.
- It can be injected dynamically, not depend on a feature flag, and it's zero cost if not used.

Based on DispatchDevice related API. These independent features can be implemented and can be composed to use together by the user.

Count the living API object instance.
Count the API call.
Memory usage statistics

May 23 '25 02:05 mikialex

I'm not entirely sure counting wgpu objects themselves and exposing that is completely in-scope for the wgpu crate itself. It's surely a nice feature, but ass you pointed out it can be externally & transparently done by the dispatch device mechanism - with this mechanism this could be fairly easily done as a separate crate that is used at the application layer owning the device (all dependencies that get a wgpu device injected will have your counters without knowing). Would be interesting to explore that! The hal counters I pointed at on the other hand are more lowlevel and thus much closer to actual resource overhead (still not great for actual memory use btw, for that you'd likely want to query api or even vendor specific extensions), so they cover needs better if you're interested in resource tracking rather than the behavior of your application.

May 23 '25 07:05 Wumpf

I did some experimental implementation based on the latest wgpu trunk. It turns out If I want to wrap and proxy the wgpu implementation, I have to access the inner dispatch object of wgpu api.

For example:

#[derive(Debug)]
pub struct InstrumentedDevice {
    pub internal: DispatchDevice,
    pub buffer_stat: Arc<RwLock<BufferStatistics>>,
}

impl InstrumentedDevice {
    // user use this api to instrument a device
    pub fn wrap(internal: Device) -> (Device, Arc<RwLock<BufferStatistics>>) {
        let buffer_stat = Default::default();
        let device = Device::from_custom(InstrumentedDevice {
            internal: internal.get_internal_dispatch(), // this missing api should be supported by wgpu
            buffer_stat: buffer_stat.clone(),
        });
        (device, buffer_stat)
    }
}


impl DeviceInterface for InstrumentedDevice {
...
}

impl get_internal_dispatch or From<DispatchInstance> for Instance is easy because it just exposes the underlayer type. Should I add these support in wgpu?

Jun 13 '25 03:06 mikialex

Why can't you just wrap wgpu::Device like:

#[derive(Debug)]
pub struct InstrumentedDevice {
    pub internal: wgpu::Device,
    pub buffer_stat: Arc<RwLock<BufferStatistics>>,
}

impl InstrumentedDevice {
    // user use this wrapper to instrument a device
    pub fn wrap(internal: wgpu::Device) -> (Device, Arc<RwLock<BufferStatistics>>) {
        let buffer_stat = Default::default();
        let device = Device::from_custom(InstrumentedDevice {
            internal,
            buffer_stat: buffer_stat.clone(),
        });
        (device, buffer_stat)
    }
}

impl DeviceInterface for InstrumentedDevice {
//...
}

Jun 24 '25 11:06 sagudev

Count the living API object instance. Count the API call. Memory usage statistics

These are what the internal counters and the allocator report are for. No need to introduce a new concept for this.

Jun 24 '25 16:06 nical

@sagudev I already tried this way before, and it's not good. For example how to implement the DeviceInterface's any member function that returns another "dispatch" version of wgpu api?

impl DeviceInterface for InstrumentedDevice {
    fn create_compute_pipeline(
        &self,
        desc: &wgpu::ComputePipelineDescriptor<'_>,
    ) -> wgpu::custom::DispatchComputePipeline {
        let pipeline = self.internal.create_compute_pipeline(desc);
        wgpu::custom::DispatchComputePipeline::custom(?)
    }
..
}

To support this, you have to implement ComputePipelineInterface for wgpu::ComputePipeline and for every other pair of wgpu level api. As this trait and structs all defined by wgpu, so these implementations have to live in wgpu. This is huge and meaningless change.

@nical As I mentioned before, I don't care about the internal counters inside the wgpu core/hal. I want the wgpu level api instrumentation(for example I want this works on web). And I want use the current DispatchDevice as a extension point to implement this feature by myself outside of the wgpu. The above PR is just to improve the DispatchDevice related api to effectively support device level proxy/hooking for my requirements.

Jun 24 '25 16:06 mikialex

To support this, you have to implement ComputePipelineInterface for wgpu::ComputePipeline and for every other pair of wgpu level api. As this trait and structs all defined by wgpu, so these implementations have to live in wgpu. This is huge and meaningless change.

The intended way of using custom backend was to create wrappers for all wgpu types. Although given your limited use case (IIUC you will only need to increment counters, thus just wrapping device and inc counters in create_* suffice) that does seem like overkill, but I am not sure if it's worth exposing some of internals just for this.

Jun 24 '25 17:06 sagudev

“The intended way of using custom backend was to create wrappers for all wgpu types”. Yes, this is what I'm doing now, but failed to do so. If not exposing the DispatchDevice from the wgpu::Device, then what's the most recommended way to proxy the wgpu::Device method call using the current custom backend mechanism?

Jun 24 '25 17:06 mikialex

“The intended way of using custom backend was to create wrappers for all wgpu types”. Yes, this is what I'm doing now, but failed to do so. If not exposing the DispatchDevice from the wgpu::Device, then what's the most recommended way to proxy the wgpu::Device method call using the current custom backend mechanism?

You need to create wrappers for all the types, that will internly hold wgpu object:

struct CustomComputePipeline(wgpu::ComputePipeline);

impl ComputePipelineInterface for CustomComputePipeline {
// ...
}

impl DeviceInterface for InstrumentedDevice {
    fn create_compute_pipeline(
        &self,
        desc: &wgpu::ComputePipelineDescriptor<'_>,
    ) -> wgpu::custom::DispatchComputePipeline {
        let pipeline = self.internal.create_compute_pipeline(desc);
        wgpu::custom::DispatchComputePipeline::custom(CustomComputePipeline(pipeline))
    }
//...
}

Jun 24 '25 17:06 sagudev