wgpu icon indicating copy to clipboard operation
wgpu copied to clipboard

device.create_compute_pipeline hangs

Open peters-david opened this issue 3 years ago • 22 comments

I want to run a compute shader. Everything until and including device.create_shader_module runs without problems, no validation errors. The next step is to call device.create_compute_pipeline which hangs. From system monitor i see that this call uses around 3 GB of RAM.

I created a repo where you can reproduce this issue: Issue repo

I probably just misunderstood something about wgsl but i am not sure how to find the problem since naga doesn't give me any errors.

Is this related to the compute shader being rather big? Could you give me some pointers how to find the problem?

Tried this on different systems: Linux Ubuntu / Mesa Intel Iris Plus Graphics (ICL GT2) Linux Ubuntu / NVIDIA Quadro M4000 Windows 10 / NVIDIA Quadro M4000

peters-david avatar Mar 08 '22 16:03 peters-david

So it hands on 3 different systems, technically? That's definitely unexpected!

kvark avatar Mar 09 '22 04:03 kvark

Yes, also tested on Ubuntu / Intel HD Graphics 3000 (SNB GT2) with the same result.

peters-david avatar Mar 09 '22 06:03 peters-david

Any progress on this? I tried running the WebGPU samples on the latest Firefox Nightly a few days ago and all samples with computer shaders seemed to suffer the same issue. No WebGPU graphics displayed (even the portions not requiring compute shaders, if there were any), and the RAM usage was anomalously high. 3 GB sounds about right. I would estimate maybe 4 GB from my memory of what the RAM graph looked like, but that was a few days ago. The system was a laptop with Ubuntu 22.04 and an NVIDIA GPU and an Intel CPU. I think it was probably an NVIDIA RTX 30xx series card, but I don't know exactly which one. The whole computer was only $800 dollars (2020) with 8 GB RAM and 256 GB SSD, so nothing too high-end.

Edit: turns out if was a GTX 1650

JasonS05 avatar May 17 '23 23:05 JasonS05

I tested just now on an iMac running Mojave (1.14.6) with a "4 GHz Intel Core i7" and "AMD Radeon R9 M295X 4 GB" and when the WebGPU samples page was open the memory usage of Firefox Nightly rose steadily but slowly without apparent limit. At 9 GB I switched to a different tab and memory usage stopped rising. Then I switched back and it resumed rising. Closing the WebGPU tab instantly dropped the memory usage to only a few hundred MB. The memory leak began when I opened the Cornell Box sample (that one specifically, other compute shader ones didn't do it) and kept leaking even when I switched to other samples, even the simplest one. Only closing the tab cured the leak. This particular sample also gave an error regarding usage of an unsupported texture format (I think "bgra8unorm" or something). Also, in order to get any of the samples to run, I had to enabled the "gfx.webgpu.ignore-blocklist" setting in about:config and restart the browser.

JasonS05 avatar May 17 '23 23:05 JasonS05

Does this issue also happen in Google Chrome?

ErichDonGubler avatar May 18 '23 04:05 ErichDonGubler

In the WebGPU samples website all samples worked on Chrome on my iMac except the Cornell Box. So the compute capabilities function fine. As for the Cornell Box memory leak, I'll test that tomorrow on Chrome.

JasonS05 avatar May 18 '23 06:05 JasonS05

Ok that's strange. Today on Chrome WebGPU isn't working at all on my iMac. I'm just getting TypeError: Cannot read properties of null (reading 'requestDevice'). I even tried enabling WebGPU developer features and no luck. Chrome version is 113.0.5672.92. But I know it definitely, positively worked a few days ago. Either that or I'm seriously hallucinating.

JasonS05 avatar May 18 '23 21:05 JasonS05

@JasonS05 the issues you are facing might not be related to this bug report. Please try to reproduce this issue by trying to run the repo linked in the description or file a bug here for Firefox issues.

teoxoy avatar Jun 05 '23 10:06 teoxoy

I'm hitting https://github.com/gfx-rs/wgpu/issues/4393 while trying to run this on the DX12 backend. On Vulkan, it doesn't hang but takes minutes for the pipeline to be created.

@peters-david was the call to create_compute_pipeline just slow or was it really hanging (never completing)?

teoxoy avatar Jun 05 '23 11:06 teoxoy

@teoxoy It never completed for me. The longest I waited was around 30 minutes. It may have completed after that but I didn't bother to wait longer.

peters-david avatar Jun 05 '23 11:06 peters-david

Did you notice the RAM usage continuously increasing? that's what I noticed while it was creating the pipeline.

Also, this issue is one year old, do you have any new findings?

teoxoy avatar Jun 05 '23 12:06 teoxoy

@teoxoy Yes, ram usage increased. I didn't really work on it since then, sorry, but I can test again if it helps.

peters-david avatar Jun 05 '23 12:06 peters-david

I see, np. If you'd be able to narrow down the slowness to a specific section of code within wgpu by profiling the test app that would be appreciated.

teoxoy avatar Jun 05 '23 13:06 teoxoy

@JasonS05 the issues you are facing might not be related to this bug report. Please try to reproduce this issue by trying to run the repo linked in the description or file a bug here for Firefox issues.

I tried compiling the linked repo just now but it had several compile errors. As I am not familiar with rust I do not know how to proceed. These are the error messages:

Click to expand

   Compiling test v0.1.0 (/home/jason/Desktop/Coding Stuff/github/peters-david.test)
error[E0308]: mismatched types
    --> src/gpu.rs:19:60
     |
19   |         let instance: wgpu::Instance = wgpu::Instance::new(wgpu::Backends::all());
     |                                        ------------------- ^^^^^^^^^^^^^^^^^^^^^ expected struct `InstanceDescriptor`, found struct `Backends`
     |                                        |
     |                                        arguments to this function are incorrect
     |
note: associated function defined here
    --> /home/jason/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/8b6599b/wgpu/src/lib.rs:1343:12
     |
1343 |     pub fn new(instance_desc: InstanceDescriptor) -> Self {
     |            ^^^

error[E0308]: mismatched types
    --> src/gpu.rs:67:46
     |
67   |       let shader = device.create_shader_module(&wgpu::ShaderModuleDescriptor {
     |  _________________________--------------------_^
     | |                         |
     | |                         arguments to this function are incorrect
68   | |         label: None,
69   | |         source: wgpu::ShaderSource::Wgsl(Cow::Borrowed(include_str!("shader.wgsl"))),
70   | |     });
     | |_____^ expected struct `ShaderModuleDescriptor`, found `&ShaderModuleDescriptor<'_>`
     |
note: associated function defined here
    --> /home/jason/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/8b6599b/wgpu/src/lib.rs:1948:12
     |
1948 |     pub fn create_shader_module(&self, desc: ShaderModuleDescriptor) -> ShaderModule {
     |            ^^^^^^^^^^^^^^^^^^^^
help: consider removing the borrow
     |
67   -     let shader = device.create_shader_module(&wgpu::ShaderModuleDescriptor {
67   +     let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
     |

error[E0599]: no method named `dispatch` found for struct `ComputePass` in the current scope
   --> src/gpu.rs:130:22
    |
130 |         compute_pass.dispatch(1, 1, 1); // Number of cells to run, the (x,y,z) size of item being processed
    |                      ^^^^^^^^ method not found in `ComputePass<'_>`

error[E0061]: this function takes 2 arguments but 1 argument was supplied
    --> src/gpu.rs:142:54
     |
142  |     let cpu_buffer_out_future = cpu_buffer_out_slice.map_async(wgpu::MapMode::Read);
     |                                                      ^^^^^^^^^--------------------- an argument is missing
     |
note: associated function defined here
    --> /home/jason/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/8b6599b/wgpu/src/lib.rs:2547:12
     |
2547 |     pub fn map_async(
     |            ^^^^^^^^^
help: provide the argument
     |
142  |     let cpu_buffer_out_future = cpu_buffer_out_slice.map_async(wgpu::MapMode::Read, /* value */);
     |                                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: `()` is not a future
   --> src/gpu.rs:146:42
    |
146 |     if let Ok(()) = cpu_buffer_out_future.await {
    |                                          ^^^^^^
    |                                          |
    |                                          `()` is not a future
    |                                          help: remove the `.await`
    |
    = help: the trait `Future` is not implemented for `()`
    = note: () must be a future or must implement `IntoFuture` to be awaited
    = note: required for `()` to implement `IntoFuture`

Some errors have detailed explanations: E0061, E0277, E0308, E0599.
For more information about an error, try `rustc --explain E0061`.
error: could not compile `test` due to 5 previous errors

As for filing a bug report with Bugzilla, I do not have an account there so I won't be posting a bug report there for the moment.

JasonS05 avatar Jun 06 '23 03:06 JasonS05

Add rev = "0ac9ce002656565ccd05b889f5856f4e2c38fa73" (it was the latest commit on the day the bug was filed) to the wgpu entry in Cargo.toml.

As for filing a bug report with Bugzilla, I do not have an account there so I won't be posting a bug report there for the moment.

Logging in via github should work - but up to you.

teoxoy avatar Jun 06 '23 13:06 teoxoy

Unfortunately, that made a new error. Something about no suitable version of web-sys. Full error here:

Click to expand

    Updating git repository `https://github.com/gfx-rs/wgpu`
    Updating crates.io index
    Updating git repository `https://github.com/gfx-rs/naga`
    Updating git repository `https://github.com/gfx-rs/metal-rs`
error: failed to select a version for `web-sys`.
    ... required by package `wgpu v0.12.0 (https://github.com/gfx-rs/wgpu?rev=0ac9ce002656565ccd05b889f5856f4e2c38fa73#0ac9ce00)`
    ... which satisfies git dependency `wgpu` of package `test v0.1.0 (/home/jason/Desktop/Coding Stuff/github/peters-david.test)`
versions that meet the requirements `^0.3.53` (locked to 0.3.63) are: 0.3.63

the package `wgpu` depends on `web-sys`, with features: `GpuBufferUsage` but `web-sys` does not have these features.


failed to select a version for `web-sys` which could resolve this conflict

JasonS05 avatar Jun 08 '23 02:06 JasonS05

Run cargo clean and delete the Cargo.lock file (at least that's what I did).

teoxoy avatar Jun 08 '23 12:06 teoxoy

Ok, it works now. When I ran the program it seemed to hang at the described spot with a total system memory usage hovering around 5.8 GiB. After a couple minutes it finished whatever it was doing and the program exited normally leaving the system memory at 3.9 GiB. Running the program again does not reproduce the hang and the whole thing executes in under a second.

This is with my Ubuntu 22.04.2 LTS, GTX 1650 system

JasonS05 avatar Jun 08 '23 21:06 JasonS05

I ran into this issue on a Macbook Air with an M1 processor. I found the cause to be a large multi-dimensional array in the workgroup memory space. I made a minimal repro case here: https://github.com/vimwitch/webgpu-hang-repro

Some things I noticed during testing:

  • Problem does not occur for storage memory
  • Problem occurs with single dimensional arrays
  • After waiting for the pipeline to be created once, subsequent creations do not hang. Changing the size of the array causes the next creation to hang. Reverting to the previous value after waiting for the changed size to be created does not result in another hang.
  • Changing shader logic causes the hang to occur again
  • Changing workgroup size does not cause hang to occur again
  • If the shader logic does not touch the array the hang does not occur
  • During the hang system memory and CPU use is unaffected
  • During the hang the program CPU use is 0, memory use is constant at 3.9 MB

Apple M1 Macbook Air OSX 12.5

chancehudson avatar Nov 18 '23 12:11 chancehudson

I profiled the repro above: https://share.firefox.dev/3G3Al3W

image

chancehudson avatar Nov 18 '23 12:11 chancehudson

Is there any progress on this issue? I'm encountering a similar problem where my program stalls on the device.create_compute_pipeline line.

To me, it looks like the compute pipeline pre-runs the shader on the first look, which causes this long stall before it completes. As I noticed the more time-intensive functions I ran in my main function the longer the pipeline took to complete

Forpee avatar Dec 15 '23 14:12 Forpee

Assigning Teo to try to reproduce, investigate cause, and estimate size.

jimblandy avatar May 20 '24 05:05 jimblandy