wgpu icon indicating copy to clipboard operation
wgpu copied to clipboard

Command Encoders are not getting dropped and leaking memory

Open coderedart opened this issue 2 years ago • 4 comments

Description I created a command encoder, and forgot to use it. and I observed that my app started leaking memory. it seems that if the command encoders are not submitted to queue, they get leaked and not dropped properly. at 75 fps, it was leaking about 20 MB+ / second

Repro steps well, its just a basic example.

  1. create instance, adapter, device..
  2. in a loop create a bunch of command encoders, but don't use them. sleep for a couple milliseconds so that the app doesn't run/quit too fast
  3. open task manager or something and you can see the memory usage of the app growing slowly over time. on linux, It started with around 20 MB and kept on growing like 1 MB / second. probably because wgpu creates smaller command pools now, as unlike my app, there's nothing being done on the command pool
use anyhow::Context;
use wgpu::CommandEncoderDescriptor;

async fn fake_main() -> anyhow::Result<()> {
    let instance = wgpu::Instance::new(wgpu::Backends::VULKAN);

        let adapter = instance
            .request_adapter(&wgpu::RequestAdapterOptions {
                power_preference: wgpu::PowerPreference::HighPerformance,
                force_fallback_adapter: false,
                compatible_surface: None,
            })
            .await.context("failed to request adapter")?;
        let (device, _) = adapter
            .request_device(
                &wgpu::DeviceDescriptor {
                    label: "overlay device".into(),
                    features: Default::default(),
                    limits: Default::default(),
                },
                None,
            )
            .await?;
        let mut counter = 0;
        loop {
            let _ = device.create_command_encoder(&CommandEncoderDescriptor { label: Some("command encoder") });
            counter += 1;
            if counter > 10000 {
                break;
            }
            std::thread::sleep(std::time::Duration::from_millis(2));
        }
        Ok(())
}

fn main() {
    pollster::block_on(fake_main()).unwrap();
}

Expected vs observed behavior I'm not using unsafe like mem::forget, so I would expect command encoder to clean after itself. or it must be documented at https://docs.rs/wgpu/latest/wgpu/struct.CommandEncoder.html that we MUST submit it or it will leak.

but right now, it doesn't clean the command pools allocated by itself (checked renderdoc and i had a few hundred Command Pools allocated in (vulkan) resource inspector tab). and the documentation doesn't state this explicitly.

Platform wgpu version: 0.12 or master branch OS: Endeavour (Arch) GPU: 1070 TI GPU Drivers: Nvidia Proprietary nvidia-dkms 510.54-1 backend: Vulkan

coderedart avatar Mar 23 '22 18:03 coderedart

Does inserting an empty submit every N operations cause the problem go away?

cwfitzgerald avatar Mar 23 '22 23:03 cwfitzgerald

Does inserting an empty submit every N operations cause the problem go away?

nope. I had like 2 command encoders per frame. the first was for clearing screen (did not know that wgpu cleared it for me, and i was thinking that I was actually using the encoder) which i forgot to submit and caused leaks. the second was for egui, which i submitted to queue every frame at the end right before calling surface present.

coderedart avatar Mar 24 '22 00:03 coderedart

I've reproduced this on some benchmarks, so it's definitely happening.

cwfitzgerald avatar May 28 '22 06:05 cwfitzgerald

This is going to need to be deferred to after 0.13, this is going to be fixed with a major refactor I don't have time for this release.

cwfitzgerald avatar Jun 15 '22 03:06 cwfitzgerald

I'm also seeing a memory leak. Looking at dhat, I believe it could be due to command encoders not being removed from executing_command_encoders. Although, I am submitting the encoder. I've looked at the device/queue.rs file for probably twenty minute and maybe I'm dumb, but I can't figure out how an encoder would ever get removed from that list. I'll also post my render function in case that helps.

fn render(&mut self) -> Result<(), wgpu::SurfaceError> {
	// the new texture we can render to
	let output = self.surface.get_current_texture()?;
	let view = output
		.texture
		.create_view(&wgpu::TextureViewDescriptor::default());

	// this will allow us to send commands to the gpu
	let mut encoder = self
		.device
		.create_command_encoder(&wgpu::CommandEncoderDescriptor {
				label: Some("Render Encoder"),
		});

	let num_instances = self.instances.len();
	self.instances.fill_buffer(&self.device, &self.queue); // none of these functions cause the leak. i checked
	self.camera.refresh(&self.queue);
	self.textures.fill_textures(&self.queue);

	{
		let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
			label: Some("Render Pass"),
			color_attachments: &[Some(wgpu::RenderPassColorAttachment {
				view: &view,
				resolve_target: None,
				ops: wgpu::Operations {
					load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
					store: true,
				},
			})],
			depth_stencil_attachment: None,
		});

		render_pass.set_pipeline(&self.render_pipeline);
		render_pass.set_bind_group(0, self.camera.bind_group(), &[]);
		render_pass.set_bind_group(1, self.textures.bind_group(), &[]);
		render_pass.set_vertex_buffer(0, self.square_vertex_buffer.slice(..));
		render_pass.set_vertex_buffer(1, self.instances.buffer_slice());
		render_pass.draw(0..self.square_vertices, 0..num_instances);
	}
	// the encoder can't finish building the command buffer until the
	// render pass is dropped

	// submit the command buffer to the GPU
	profiling::scope!("submit render");
	self.queue.submit(std::iter::once(encoder.finish()));
	output.present();

	Ok(())
}

botahamec avatar Oct 20 '22 22:10 botahamec

@cwfitzgerald Are there any updates on this? I am able to reproduce the leak in an extremely simple wgpu-native project (see https://github.com/gfx-rs/wgpu-native/issues/350), and it's a bit concerning to me that a known memory leak has gone unaddressed for nearly two years with no externally visible activity for well over a year.

If there are no updates, are there any known workarounds?

shadowndacorner avatar Jan 20 '24 01:01 shadowndacorner

@cwfitzgerald Disregard my previous comment - as someone else pointed out in the above issue, the LearnWebGPU code had a resource leak and, since my code was based on it, mine did as well. Sorry about that!

shadowndacorner avatar Jan 20 '24 21:01 shadowndacorner