iced icon indicating copy to clipboard operation
iced copied to clipboard

Memory usage on the Game of Life example

Open eduardobraun opened this issue 3 years ago • 14 comments

When running the Game of Life example, both in debug and release mode, I experience jumps in memory usage while painting some quads in the canvas. As you can see in the image, with just a few quads its using 14GB of ram, about 12GB are from the example. image

The problem seems to be with the growing of vectors, since the memory usage doubles, I did not test it extensively. But since the program uses near to no memory unless I draw quads, I did some bad math assuming no reuse of vertex when indexing: vertex: 16 bytes per quad indices: 24 bytes per quad memory: 1073741824 bytes in one GB * 12 total: memory/quadmem = 322322122547.2 quads I did not count the amount of quads on the screen, but its no where near that. For reference the amount of pixels on a 4K screen is 8294400.

Is that a problem with the example implementation or is that amount of memory consumption something I should expect when using the iced canvas?

Edit: I tried to profile the example and the results are closer to what I expected: image But, the overall system memory usage still goes beyond 12GB. I have no clue what is going on. image

eduardobraun avatar Mar 20 '21 11:03 eduardobraun

I updated the wgpu dependency to use the master from github, and now its panicking at the StagingBelt::write_buffer()

thread 'main' panicked at 'Error in Buffer::get_mapped_range: buffer offset invalid: offset 68 must be multiple of 8', /home/peer/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/645dc90/src/backend/direct.rs:112:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
   1: std::panicking::begin_panic_fmt
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:435:5
   2: wgpu::backend::direct::Context::handle_error_fatal
   3: <wgpu::backend::direct::Context as wgpu::Context>::buffer_get_mapped_range
   4: wgpu::BufferSlice::get_mapped_range_mut
   5: wgpu::util::belt::StagingBelt::write_buffer
   6: iced_wgpu::quad::Pipeline::draw
   7: iced_wgpu::backend::Backend::flush
   8: iced_wgpu::backend::Backend::draw
   9: <iced_wgpu::window::compositor::Compositor as iced_graphics::window::compositor::Compositor>::draw
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  11: winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::run
  12: winit::platform_impl::platform::EventLoop<T>::run
  13: winit::event_loop::EventLoop<T>::run
  14: iced_winit::application::run
  15: game_of_life::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Memory block wasn't deallocated
Memory block wasn't deallocated
Memory block wasn't deallocated
[1]    28053 segmentation fault (core dumped)  RUST_BACKTRACE=1 target/release/game_of_life

my guess is that if its not a multiple of 8 it was leaking? No idea.

eduardobraun avatar Mar 20 '21 14:03 eduardobraun

Changing versions of wgpu is not a simple task. this PR is what was required just to switch between release versions

13r0ck avatar Mar 20 '21 14:03 13r0ck

Well, maybe it will not work, but the tests are passing. And my point is that now its panicking on a function related to buffers, saying that it must be a multiple of 8. No one adds a fatal check if it is not important.

eduardobraun avatar Mar 20 '21 15:03 eduardobraun

Can you try running the example with Iced when using WGPU 0.5? The commit hash for that should be fb015a85d22a7c4632bd251127a89259bfd0c346. I've noticed that memory usage in general has increased substantially ever since upgrading from 0.5.

Ssaely avatar Mar 20 '21 15:03 Ssaely

     Running `target/release/game_of_life`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: AllocationError(OutOfMemory(Device))', /home/peer/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.5.6/src/device/mod.rs:324:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Same issue

eduardobraun avatar Mar 20 '21 15:03 eduardobraun

Hm, I only encountered that issue when minimizing the window on Windows. Sounds like it's related to https://github.com/hecrj/iced/issues/551 & https://github.com/hecrj/iced/issues/603#issuecomment-727520902

Ssaely avatar Mar 20 '21 16:03 Ssaely

Well, I use Sway as window manager, technically my window is always the same size. And the memory usage only rises when I actively draw quads on the screen,, Its weird that its the virtual memory of the application that increases but every GB of the virtual memory is taken from my main memory.

eduardobraun avatar Mar 20 '21 16:03 eduardobraun

So I did a profile with valgrind this time image

eduardobraun avatar Mar 20 '21 18:03 eduardobraun

Been able to "fix" the issue by replacing the staging belt clear with a new one:

        // Recall staging buffers
        // self.local_pool
        //     .spawner()
        //     .spawn(self.staging_belt.recall())
        //     .expect("Recall staging belt");

        // self.local_pool.run_until_stalled();
        self.staging_belt = wgpu::util::StagingBelt::new(Self::CHUNK_SIZE);

in wgpu/src/window/compositor.rs

also tried using async-std::block_on instead of the local_pool:

        async_std::task::block_on(async {
            self.staging_belt.recall().await;
        });

but it seems that the recall is blocking forever.

eduardobraun avatar Mar 21 '21 20:03 eduardobraun

This is very interesting, I will try to use test this for the scrollable as well.

Do you mind sharing the dependencies in your Caro.toml?

GiulianoCTRL avatar Mar 22 '21 09:03 GiulianoCTRL

The issue happens out of the box on the game of life example, currently I changed from tokio to async-std as I was testing..

[dependencies]
iced = { path = "../..", features = ["canvas", "async-std"] }
# tokio = { version = "1.0", features = ["sync"] }
async-std = "1.0"
version = "1.0"
itertools = "0.9"
rustc-hash = "1.1"

eduardobraun avatar Mar 22 '21 09:03 eduardobraun

I can reproduce when using iced_wgpu.

However, the iced_glow backend is unaffected. In that case, the game_of_life example takes only around 30-40 MB. Therefore, I believe everything seems to point towards an issue with the StagingBelt in wgpu.

Next step would be to create a SSCCE only using wgpu and report the issue to the authors.

hecrj avatar Mar 27 '21 09:03 hecrj

I gave another look at the issue, After adding a debug function at wgpu::util::StagingBelt:

impl StagingBelt {
    pub fn print(&self) {
        println!("Active: {} Closed: {} Free: {}", self.active_chunks.len(), self.closed_chunks.len(), self.free_chunks.len());
    }
...
}

I got the following:

Draw
Active: 2 Closed: 0 Free: 12445
Finish
Active: 0 Closed: 2 Free: 12445
Submit
Active: 0 Closed: 2 Free: 12445
Recall
Active: 0 Closed: 0 Free: 12447
Draws: 15594 Recalls: 15593
Draw
Active: 2 Closed: 0 Free: 12445
Finish
Active: 0 Closed: 2 Free: 12445
Submit
Active: 0 Closed: 2 Free: 12445
Recall
Active: 0 Closed: 0 Free: 12451
Draws: 15595 Recalls: 15593

The StagingBelt keeps allocating new chunks even having many free chunks. So looking at the allocation logic in src/util/belt.rs+97

    pub fn write_buffer(
        &mut self,
        encoder: &mut CommandEncoder,
        target: &Buffer,
        offset: BufferAddress,
        size: BufferSize,
        device: &Device,
    ) -> BufferViewMut {

        let mut chunk = if let Some(index) = self
            .active_chunks
            .iter()
            .position(|chunk| chunk.offset + size.get() <= chunk.size)
        {
            self.active_chunks.swap_remove(index)
        } else if let Some(index) = self
            .free_chunks
            .iter()
            .position(|chunk| size.get() <= chunk.size)
        {
            self.free_chunks.swap_remove(index)
        } else {
            let size = self.chunk_size.max(size.get());
            Chunk {
                buffer: device.create_buffer(&BufferDescriptor {
                    label: Some("staging"),
                    size,
                    usage: BufferUsage::MAP_WRITE | BufferUsage::COPY_SRC,
                    mapped_at_creation: true,
                }),
                size,
                offset: 0,
            }
        };
...
}

We see that it will only use a free chunk if its size is less or equal any of the free chunks. Since the amount of vertices and index in the game of life canvas only go up if you add more geometry, it will never reuse old chunks and it has no way to release the free chunks.

eduardobraun avatar May 08 '21 12:05 eduardobraun

After adding a naive garbage collector to the StagingBelt:

    pub fn tick(&mut self) {
        self.tick += 1;
    }

    pub fn gc(&mut self) {
        let t = self.tick;
        self.free_chunks.retain(|chunk| {(t - chunk.tick) < 5});
    }

The memory usage stays constant at ~2.8GB (virtual memory).

eduardobraun avatar May 08 '21 13:05 eduardobraun