wgpu icon indicating copy to clipboard operation
wgpu copied to clipboard

Dropped devices leak memory on Windows dx12

Open kpreid opened this issue 2 years ago • 15 comments

Description If I repeatedly create devices with substantial resources, then eventually I will get a failure to create a device (or, if one of the existing devices is used further, an allocation failure within it) even if the devices are dropped.

(This was discovered inside my application's test suite — I recognize that creating many devices is an irregular situation.)

Repro steps Stick this test case in wgpu/tests/ and run it (branch with needed changes @ https://github.com/gfx-rs/wgpu/compare/master...kpreid:wgpu:winleak):

use crate::common::{initialize_adapter, initialize_device};

use wasm_bindgen_test::wasm_bindgen_test;
use wgpu::*;

#[wasm_bindgen_test]
#[test]
fn allocate_large_3d_textures_in_devices() {
    let (adapter, _surface_guard) = initialize_adapter();

    for i in 0..100 {
        let (device, _queue) = pollster::block_on(initialize_device(
            &adapter,
            Features::empty(),
            Limits::default(),
        ));

        println!("{i}");
        // Each one of these textures occupies at least 256*256*256*4 ≈ 67MB.
        let _texture = device.create_texture(&TextureDescriptor {
            label: None,
            size: Extent3d {
                width: 256,
                height: 256,
                depth_or_array_layers: 256,
            },
            mip_level_count: 1,
            sample_count: 1,
            dimension: wgpu::TextureDimension::D3,
            format: wgpu::TextureFormat::Rgba8UnormSrgb,
            view_formats: &[],
            usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::COPY_DST,
        });
    }
}

The leak does not (visibly) occur if the same textures are created and dropped while using a single device.

Platform

  • wgpu 0.15 or git master (b534379acbd16d427e917b382d19d2e16000d829).
  • I believe this did not happen with wgpu 0.14.
  • GitHub Actions Windows runner
Adapter 0:
	   Backend: Dx12
	      Name: "Microsoft Basic Render Driver"
	  VendorID: 5140
	  DeviceID: 140

kpreid avatar Feb 19 '23 01:02 kpreid

Hmm, both the driver and nature of issue (memory leak, maybe refcounting issue?) Seem like they might be related to #3485?

ErichDonGubler avatar Feb 20 '23 23:02 ErichDonGubler

I've also seen similar issues on Windows. I didn't have time to investigate them yet, so I don't have any details, but definitely something like that happens.

AdrianEddy avatar Feb 21 '23 02:02 AdrianEddy

Disabling dx12 suballocation makes the test pass on my system.

--- a/wgpu/Cargo.toml
+++ b/wgpu/Cargo.toml
@@ -142,7 +142,7 @@ hal = { workspace = true }
 hal = { workspace = true, features = ["renderdoc"] }

 [target.'cfg(windows)'.dependencies]
-hal = { workspace = true, features = ["dxc_shader_compiler", "renderdoc", "windows_rs"] }
+hal = { workspace = true, features = ["dxc_shader_compiler", "renderdoc"] }

If you have a non-test reproduction, gpu-allocator should be outputting warn level logs if it's leaking memory when we're dropping it.

Elabajaba avatar Feb 21 '23 18:02 Elabajaba

I turned your test code into an example and shoved a println! into Device::exit(...) (in wgpu-hal/src/dx12/device.rs), and it looks like Device::exit doesn't get called until all the loop iterations complete (Vulkan seems to have the same issue, but for whatever reason doesn't end up with the same kind of memory bloat).

edit: More testing, device::exit() seems to only run if the instance creation is in the loop. Note that with instance out of the loop (like in the OP), device::exit(...) runs for every created device in the loop after the loop is completed and the instance is dropped.

Elabajaba avatar Feb 21 '23 19:02 Elabajaba

Some more context:

The texture seems to be freed properly (println debugging shows destroy_texture() being called), and it leaks even without the texture.

The issue is that when you create a new gpu-allocator allocator (which happens when we create a new device) it allocates a block of memory that it uses to suballocate smaller blocks from for buffers and textures, and there's a bug somewhere that's stopping devices from being dropped until the Instance is dropped. (sidenote, I'd almost consider Vulkan not leaking memory here a bug, as I thought our gpu-alloc implementation worked similarly to gpu-allocator with regards to creating and holding onto large ~256MB blocks of memory to suballocate from)

Elabajaba avatar Feb 21 '23 21:02 Elabajaba

The issue is that when you create a new gpu-allocator allocator (which happens when we create a new device) it allocates a block of memory that it uses to suballocate smaller blocks from for buffers and textures, and there's a bug somewhere that's stopping devices from being dropped until the Instance is dropped.

Ironically, the reasoning behind my original code creating many Devices was “Well, a Device is the largest stateful-in-the-API unit, so creating a new one per test case is probably the best tradeoff between test isolation and performance”. If that's in fact not a good option (perhaps for reasons beyond this particular delayed dealloc bug) then it would be nice if there were clues in the API documentation.

kpreid avatar Feb 22 '23 05:02 kpreid

(This was discovered inside my application's test suite — I recognize that creating many devices is an irregular situation.)

Firefox needs to be able to create endless numbers of devices without leaking, because device creation is under web content's control. We could decouple WebGPU API Devices from wgpu Devices, but it seems just as easy to fix wgpu, and that'd help a broader audience.

Ironically, the reasoning behind my original code creating many Devices was “Well, a Device is the largest stateful-in-the-API unit, so creating a new one per test case is probably the best tradeoff between test isolation and performance”. If that's in fact not a good option (perhaps for reasons beyond this particular delayed dealloc bug) then it would be nice if there were clues in the API documentation.

Creating separate devices for isolation seems like something that ought to work.

jimblandy avatar Sep 27 '23 17:09 jimblandy

Tests for memory leaks could potentially use https://crates.io/crates/sysinfo to get such information, but it would not help the case where we're leaking gpu memory.

cwfitzgerald avatar Sep 27 '23 18:09 cwfitzgerald

Are the applications calling poll properly? It seems like dx12's Device::exit gets called in a pretty straightforward way if Global::exit_device is ever called at all. That should always happen if the device is genuinely unreferenced and we're calling poll_devices.

jimblandy avatar Sep 27 '23 19:09 jimblandy

Are the applications calling poll properly?

In the situation which prompted me to file this issue, they were — each device belonged to a headless render test, so it would poll to retrieve the rendered image. Of course, the test doesn't poll after it completes, and won't poll if it somehow panics before getting to that point.

If all that is needed to avoid a leak is to poll after the resources are garbage, maybe wgpu should automatically poll just before creating a device?

kpreid avatar Sep 27 '23 22:09 kpreid

This seems to be fixed on wgpu 0.19.

Elabajaba avatar Jan 18 '24 22:01 Elabajaba

I have just learned that the WebGPU specification says that you can only requestDevice() once per Adapter — you're supposed to request a fresh Adapter to request another device. (See #5764 for more.) So, the test case I filed this issue with is incorrect — it should fail when it tries to use the same Adapter a second time. This issue might therefore be entirely moot, or not; I don't know enough about wgpu internals to say.

kpreid avatar Jun 01 '24 03:06 kpreid

Right, D3D12 will also give you back the same device from the same adapter (they are singletons per adapter).

teoxoy avatar Jun 03 '24 10:06 teoxoy

@teoxoy I think you misunderstand. I'm saying that, according to the WebGPU specification, calling requestDevice() more than once on a given Adapter value should return an error. An Adapter value isn't in a 1:1 relationship with the hardware; it's a handle to the opportunity to create one Device value. Therefore, the question of what happens when you have multiple Devices created from one Adapter is moot — that's prohibited. But none of that has anything to do with the cardinality of the underlying driver/platform-API adapter and device entities.

So, the original test code in principle ought to be rewritten to repeatedly call initialize_adapter, not just initialize_device, to make it correct. This API-usage fix is independent of whether or not there's (still) a memory leak, unless it happens that the leak only happens in this case, in which case the bug can be fixed simply by making wgpu spec-conformant.

kpreid avatar Jun 03 '24 15:06 kpreid

I agree we should implement the validation, I was trying to provide relevant info as to why we are leaking without having the validation implemented. After all, this validation is in the spec most likely because of D3D12's behavior.

teoxoy avatar Jun 04 '24 19:06 teoxoy

Disabling dx12 suballocation makes the test pass on my system.

https://github.com/gfx-rs/wgpu/pull/5943 fixed this but I'll leave this open since the original issue hasn't been solved.

teoxoy avatar Jul 11 '24 12:07 teoxoy

The repro in the OP is no longer leaking. This was fixed by 6e21f7a9291db4395192d6b510d906978ae2d251 (https://github.com/gfx-rs/wgpu/pull/3626).

teoxoy avatar Jul 30 '24 17:07 teoxoy