Took more than 10minutes to create compute pipeline on Linux
Description creating compute pipeline becomes super slow once some changes have been made to the shader
Repro steps Ideally, a runnable example we can check out. Expected vs observed behavior faster
Extra materials N/A
Platform Linux Mint based on ubuntu 22.04, i7-13gen, 64GB, GTX4090
@GopherJ: This issue does not actually provide instructions for running an example of the behavior described. I don't think it's reasonable to expect contributors here to build and run binaries that are not trivial to validate for safety and security without instructions, so: Unless you can make your example small (i.e., everything in a pair of Cargo.toml and main.rs), determine what changes specifically are causing this behavior (perhaps as a diff), and quantify what "slow" means, I intend to close this issue.
hi @ErichDonGubler it's more a problem of running wgpu on linux. If more context is needed I can try to make one later.
And it's not related to my code because the bottle neck is on create_compute_pipeline API
our ci regularly runs create_compute_pipeline with various inputs on all platforms without this issue, so it clearly is related to either the sample at hand or your setup. So yes, more context is needed!
I'll try to provide a reproducible example later.
It would be useful to know the timing of the underlying vkCreateComputePipelines call is taking, as I suspect that the hang is entirely within that call.
https://github.com/GopherJ/webgpu-shaders
@ErichDonGubler @cwfitzgerald here the reproduce repro
if you try to run:
cargo test --release --no-default-features --features std -- --test-threads 1 --nocapture
it basically hangs for a while to compile OR create compute pipeline, on macos I don't observe, things are super fast
Environment:
any idea?
I don't have a Linux environment handy at the moment, so I can't directly contribute debugging hre. Inspecting the reproducible example repro, however, I suspect there is still some work to do to make the example smaller:
- If the issue is creating a compute pipeline, not executing one, we shouldn't need to keep any of the code past the stage of creating a compute pipeline. Things like constructing and executing compute passes should be unnecessary.
- I see multiple compute pipelines being created, but this bug has only mentioned a specific pipeline being slow. We should be able to limit the reproducible example to code that creates only that pipeline.
- Your
cargo testreproduction steps don't note which#[test]entries are slow. Is it all of them? Only some of them? We should narrow the reproduction steps down to only one of these cases, if possible.
The Mozilla folks aren't going to be prioritizing this for the moment. As always, others are welcome to investigate.
I have a similar issue in that create_compute_pipeline seemingly takes forever.
I will work on making a smaller repro.