xilem The memory usage is extremely high!

Recently, I've been trying to learn how to use the Xilem framework. I really like it and I'd like to express my gratitude for your efforts. However, I've found that Xilem has a very high memory usage. For a simple "Hello, World" code, it actually consumes 430MB of memory even in the release mode. I have no idea where the problem lies. It would be even better if Xilem could optimize its memory usage in the future.

main.rs

#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")] // hide console window on Windows in release

use xilem::dpi::LogicalSize;
use xilem::palette::css;
use xilem::view::{Padding, label, sized_box};
use xilem::winit::error::EventLoopError;
use xilem::winit::window::Window;
use xilem::{EventLoop, EventLoopBuilder};
use xilem::{WidgetView, Xilem};

struct WidgetGallery {
    text: String,
}

fn app_logic(data: &mut WidgetGallery) -> impl WidgetView<WidgetGallery> + use<> {
    sized_box(label(format!("Hello World: {}", data.text)).brush(css::BLACK))
        .background(css::WHITE)
        .border(css::RED, 2.)
        .padding(Padding::all(20.))
}

fn run(event_loop: EventLoopBuilder) -> Result<(), EventLoopError> {
    let data = WidgetGallery { text: String::from("Xilem") };

    let app = Xilem::new(data, app_logic);
    let window_attributes = Window::default_attributes()
        .with_title("Xilem Widgets")
        .with_resizable(true)
        .with_min_inner_size(LogicalSize::new(400., 400.))
        .with_inner_size(LogicalSize::new(650., 500.));
    app.run_windowed_in(event_loop, window_attributes)?;
    Ok(())
}

fn main() -> Result<(), EventLoopError> {
    run(EventLoop::with_user_event())
}

Cargo.toml

[package]
name = "learn_xilem"
version = "0.1.0"
edition = "2024"

[dependencies]
xilem = { git = "https://github.com/linebender/xilem.git" }

[profile.release]
lto = true
strip = true
opt-level = 3
panic = "abort"
codegen-units = 1

[profile.dev]
opt-level = 1

Apr 04 '25 03:04 bowenxuuu

Welcome! I'm glad you're having a good experience using Xilem.

High memory usage is a known problem - we do need to track it down, and the exact causes of it aren't completely clear. This hasn't been a priority for us, as the memory usage has not been high enough to be blocking any use cases (as we don't support running on the web, for example). It is definitely important to reduce it, though.

From a project management perspective, I'd suggest we should use this issue to track "determining major sources of memory usage".

One likely source of this is related to https://github.com/linebender/vello/issues/366, where we allocate buffers of a fixed size on the GPU, and these sizes are not tunable. However, I have not yet quantified exactly how much memory these are. Doing so would involve adding up the numbers in https://github.com/linebender/vello/blob/d2399b9143155362d53bd7521cd208d09bd2966c/vello_encoding/src/config.rs#L397-L404 (multiplied by the item sizes).

Apr 04 '25 07:04 DJMcNab

I was curious about this too. I ran a simple malloc profiler on the Xilem example given which only revealed around ~5MB of peak memory usage, so I decided to do what @DJMcNab suggested.

I got this table of results running the "simple" vello example. Looks like total memory allocated here is around ~170M, which does still leave a good chunk of memory unaccounted for vello-wise (htop showed 239M total resident usage for me in the vello example).

path_reduced        : 20 bytes
path_reduced2       : 5120 bytes
path_reduced_scan   : 20 bytes
path_monoids        : 5120 bytes
path_bboxes         : 96 bytes
draw_reduced        : 16 bytes
draw_monoids        : 64 bytes
info                : 16 bytes
clip_inps           : 8 bytes
clip_els            : 32 bytes
clip_bics           : 8 bytes
clip_bboxes         : 16 bytes
draw_bboxes         : 64 bytes
bump_alloc          : 32 bytes
indirect_count      : 16 bytes
lines               : 50331648 bytes
bin_headers         : 2048 bytes
paths               : 8192 bytes
bin_data            : 1048576 bytes
tiles               : 16777216 bytes
seg_counts          : 16777216 bytes
segments            : 50331648 bytes
blend_spill         : 4194304 bytes
ptcl                : 33554432 bytes
total               : 173035928 bytes

Here is the macro I used to generate the table.

macro_rules! print_sizes {
    ( $( $x:ident ),* ) => {
        {
            let mut total = 0;
            $(
                {
                    const fn size<T>(_: BufferSize<T>) -> usize {
                        std::mem::size_of::<T>()
                    }

                    println!("{:<20}: {} bytes", stringify!($x), size($x) * $x.len as usize);

                    total += size($x) * $x.len as usize;
                }
            )*
            println!("{:<20}: {} bytes", "total", total);
        }
    };
}

Apr 09 '25 11:04 Majora320

Thanks, that calculation looks right. Using the size_in_bytes method on BufferSize would probably be clearer in this case.

There's also the scene buffer (effectively at least twice a lot of the time, because we use a staging buffer per run to upload its data), which allocates memory based on the total size of the scene after resolution, and isn't tracked in buffer sizes. It's also worth noting that some of those buffers will grow with a more complex scene. And currently, I don't think we deallocate buffers as they grow in Vello. It might be worth using something like renderdoc to look at the sizes of all active buffers (if you can get it work on your machine).

Apr 09 '25 12:04 DJMcNab

macro_rules! print_sizes {
    ( $( $x:ident ),* ) => {
        {
            let mut total = 0;
            $(
                {
                    const fn size<T>(_: BufferSize<T>) -> usize {
                        std::mem::size_of::<T>()
                    }

                    println!("{:<20}: {} bytes", stringify!($x), size($x) * $x.len as usize);

                    total += size($x) * $x.len as usize;
                }
            )*
            println!("{:<20}: {} bytes", "total", total);
        }
    };
}

@Majora320 Thanks, but how to use this macro in my case, like this ?

fn main() -> Result<(), EventLoopError> {
    let r = run(EventLoop::with_user_event());
    print_sizes!(r);
    r
}

Apr 17 '25 08:04 bowenxuuu

That macro can only be used in Vello's own code, at the location I linked.

May 05 '25 14:05 DJMcNab

I re-ran this code using the latest master branch of Xilem, and the memory usage dropped from the 430MB to 380MB. I tried other GUI frameworks in the Rust community that also use the WGPU renderer, and their memory usage is typically around 50MB.

Jun 30 '25 01:06 bowenxuuu

I would like to ask if this issue is still being tracked.

Aug 05 '25 15:08 hxcliff

I think I can pretty confidently say it's not Fontique. Blitz has the ability to run without loading system fonts, and at least with how I'm measuring memory (the RES metric from htop) disabling the parley/system_fonts feature (causes text not to visibly render) makes no difference (/at most 1-2mb). This is in macOS, but that should be one of the worst backends for Fontique memory usage as it eagerly loads every font on the system (but it uses mmap, so (as I'm seeing) this shouldn't contribute to resident memory usage).

Oct 08 '25 11:10 nicoburns

A possible lead: this may partly be due to WGPU reserving a large number of descriptors by default (https://github.com/gfx-rs/wgpu/blob/37bd31ce5d53815bf3eebf0e545deabf13ca83a7/wgpu-hal/src/dx12/adapter.rs#L681). Found via Iced's discord https://discord.com/channels/628993209984614400/1021828532189528094/1428229915567980544 and not validated.

Oct 16 '25 12:10 nicoburns

From a similar issue raised against Blitz https://github.com/DioxusLabs/blitz/issues/286#issuecomment-3480392249, it's looking very much like Vello and/or WGPU are the problem(s):

It's worth noting that memory usage may vary depending on the system and CPU model. Compared to my current PC with a Core i9-12900HX running Windows 11 Pro, the default build shows worse memory performance on my other PC equipped with an AMD 5560U running Windows 10 Pro—its memory usage at startup exceeds 400 MB.

However, the Vello CPU + Softbuffer build still maintains excellent memory performance on this system, using no more than 20 MB.

When starting

After add 10 todo

("default.exe" above is using Vello) (Blitz's Skia setup is known to have a bad memory leak, so ignore the very high "after add 10 todo" memory usage for Skia)

Nov 03 '25 13:11 nicoburns

The Ribir project has identified the WGPU 0.24 upgrade as problematic, and is proposing initialising WGPU with MemoryHints::MemoryUsage as a fix (https://github.com/RibirX/Ribir/pull/797)

UPDATE1: unfortunately setting this seems to make no difference in my testing.

UPDATE2:

Comments mention this being relevant to vulkan backend. So may well help on non-apple systems.

They also mention MemoryHints::Performance being suitable for games and MemoryHints::MemoryUsage for applications. So I definitely think we ought to be setting this.

The actual values the hints correspond to are here, and the difference is not subtle:

let perf_cfg = gpu_alloc::Config {
    starting_free_list_chunk: 128 * mb,
    final_free_list_chunk: 512 * mb,
    minimal_buddy_size: 1,
    initial_buddy_dedicated_size: 8 * mb,
    dedicated_threshold: 32 * mb,
    preferred_dedicated_threshold: mb,
    transient_dedicated_threshold: 128 * mb,
};
let mem_usage_cfg = gpu_alloc::Config {
    starting_free_list_chunk: 8 * mb,
    final_free_list_chunk: 64 * mb,
    minimal_buddy_size: 1,
    initial_buddy_dedicated_size: 8 * mb,
    dedicated_threshold: 8 * mb,
    preferred_dedicated_threshold: mb,
    transient_dedicated_threshold: 16 * mb,
};

Nov 14 '25 13:11 nicoburns

As I said in https://github.com/linebender/vello/issues/1234 (which GitHub seems to have somehow misplaced), my working theory for this is that it's due to Vello's not very smart internal buffer cache. My recommended next steps for anyone for whom this is a priority is to see what the amount of memory used by that cache is, and see if avoiding reinserting buffers into it fixes the memory usage (with awareness that this will likely tank performance).

Nov 14 '25 14:11 DJMcNab