tracing regression in 0.20.0 vs 0.19.4
Description
Attempting to use copy_buffer_to_buffer in 0.20.0 crashes with:
thread 'main' panicked at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/resource.rs:121:17:
called `Option::unwrap()` on a `None` value
The full backtrace from one of my test runs is:
stack backtrace:
0: 0x5597354f12d2 - std::backtrace_rs::backtrace::libunwind::trace::he4ee80166a02c846
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
1: 0x5597354f12d2 - std::backtrace_rs::backtrace::trace_unsynchronized::h476faccf57e88641
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x5597354f12d2 - std::sys_common::backtrace::_print_fmt::h430c922a77e7a59c
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:68:5
3: 0x5597354f12d2 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hffecb437d922f988
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:44:22
4: 0x55973551660c - core::fmt::rt::Argument::fmt::hf3df69369399bfa9
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/fmt/rt.rs:142:9
5: 0x55973551660c - core::fmt::write::hd9a8d7d029f9ea1a
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/fmt/mod.rs:1153:17
6: 0x5597354ef14f - std::io::Write::write_fmt::h0e1226b2b8d973fe
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/io/mod.rs:1843:15
7: 0x5597354f10a4 - std::sys_common::backtrace::_print::hd2df4a083f6e69b8
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:47:5
8: 0x5597354f10a4 - std::sys_common::backtrace::print::he907f6ad7eee41cb
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:34:9
9: 0x5597354f255b - std::panicking::default_hook::{{closure}}::h3926193b61c9ca9b
10: 0x5597354f22b3 - std::panicking::default_hook::h25ba2457dea68e65
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:292:9
11: 0x5597354f29fd - std::panicking::rust_panic_with_hook::h0ad14d90dcf5224f
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:779:13
12: 0x5597354f2899 - std::panicking::begin_panic_handler::{{closure}}::h4a1838a06f542647
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:649:13
13: 0x5597354f17a6 - std::sys_common::backtrace::__rust_end_short_backtrace::h77cc4dc3567ca904
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18
14: 0x5597354f2604 - rust_begin_unwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
15: 0x5597348934e5 - core::panicking::panic_fmt::h940d4fd01a4b4fd1
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
16: 0x5597348935a3 - core::panicking::panic::h8ddd58dc57c2dc00
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5
17: 0x559734893486 - core::option::unwrap_failed::hf59153bb1e2fc334
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/option.rs:1985:5
18: 0x559734e379a0 - core::option::Option<T>::unwrap::hdeb99919510551b3
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/option.rs:933:21
19: 0x559734e379a0 - wgpu_core::resource::ResourceInfo<T>::id::he0c6517bd8e3f91d
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/resource.rs:121:9
20: 0x559734e38def - <wgpu_core::resource::Buffer<A> as core::ops::drop::Drop>::drop::h8fe7e4be1a0f6653
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/resource.rs:404:52
21: 0x559734de4da7 - core::ptr::drop_in_place<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>::ha1c38d8abfc5c79c
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
22: 0x559734eaf2ff - alloc::sync::Arc<T,A>::drop_slow::h52b7243041689c9a
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/sync.rs:1804:18
23: 0x559734eb3232 - <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop::ha775173f5482ce52
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/sync.rs:2459:13
24: 0x559734dcd4bb - core::ptr::drop_in_place<alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>>::h50a1ca1948d557f6
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
25: 0x559734dd943f - core::ptr::drop_in_place<(wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>)>::hcb77ccecad10dfdc
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
26: 0x559734c80e72 - core::ptr::mut_ptr::<impl *mut T>::drop_in_place::hf563684f0ac0247e
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mut_ptr.rs:1473:18
27: 0x559734c80e72 - hashbrown::raw::Bucket<T>::drop::h82efac29a399c8c8
at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:590:23
28: 0x559734c78498 - hashbrown::raw::RawTableInner::drop_elements::hc20a52aa5ac43ff3
at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:2379:17
29: 0x559734c79b80 - hashbrown::raw::RawTableInner::drop_inner_table::h322098f924a4e6c2
at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:2434:17
30: 0x559734c747fa - <hashbrown::raw::RawTable<T,A> as core::ops::drop::Drop>::drop::h8c81f5c568abf00a
at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:3678:13
31: 0x559734ddca5b - core::ptr::drop_in_place<hashbrown::raw::RawTable<(wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>)>>::h848d236953d9150c
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
32: 0x559734ddeddb - core::ptr::drop_in_place<hashbrown::map::HashMap<wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::h268807d3d564515d
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
33: 0x559734ddf22b - core::ptr::drop_in_place<std::collections::hash::map::HashMap<wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::h109c75fb36db2a51
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
34: 0x559734de8fe7 - core::ptr::drop_in_place<wgpu_core::device::life::ResourceMaps<wgpu_hal::vulkan::Api>>::h1e5585fb807c2963
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
35: 0x559734d671bd - wgpu_core::device::life::LifetimeTracker<A>::triage_submissions::h9b06f4999f3535f0
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/life.rs:413:9
36: 0x559734e285a8 - wgpu_core::device::resource::Device<A>::maintain::ha6ad7b20c3de2f07
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/resource.rs:434:13
37: 0x559734b8ddd8 - wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_submit::h95b283a3c900a2eb
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/queue.rs:1555:39
38: 0x559734b4c13e - <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::queue_submit::he97dd3b53d285061
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.0/src/backend/wgpu_core.rs:2260:27
39: 0x559734b57423 - <T as wgpu::context::DynContext>::queue_submit::hb2174c2d1030c8b5
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.0/src/context.rs:3025:13
40: 0x5597348a09e5 - wgpu::Queue::submit::hfab63398e5d30aab
at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.0/src/lib.rs:4981:27
41: 0x5597348a4980 - alan_generated_bin::read_buffer::hbedf71215769391d
at /home/damocles/.config/alan/alan_generated_bin/src/main.rs:266:5
42: 0x5597348a50b8 - alan_generated_bin::main::h1e3a29ab7fcab87a
at /home/damocles/.config/alan/alan_generated_bin/src/main.rs:309:20
43: 0x559734894f6b - core::ops::function::FnOnce::call_once::h5fce3699794672b3
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
44: 0x55973489c71e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6179380494bff8d2
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
45: 0x5597348a1441 - std::rt::lang_start::{{closure}}::hf0241278dd9be494
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
46: 0x5597354eb253 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h52f5991f9ab8b369
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
47: 0x5597354eb253 - std::panicking::try::do_call::h0ac4bee9a397a1bf
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
48: 0x5597354eb253 - std::panicking::try::hc005decaf198d0ed
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
49: 0x5597354eb253 - std::panic::catch_unwind::hb0f967d870b2a382
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
50: 0x5597354eb253 - std::rt::lang_start_internal::{{closure}}::hd140b84b0efe534b
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
51: 0x5597354eb253 - std::panicking::try::do_call::h1ddfaf1d0d576c38
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
52: 0x5597354eb253 - std::panicking::try::hdd4bdf855547659f
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
53: 0x5597354eb253 - std::panic::catch_unwind::h276ba91c7706110c
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
54: 0x5597354eb253 - std::rt::lang_start_internal::h103c42a9c4e95084
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
55: 0x5597348a141a - std::rt::lang_start::hce91f7cfea2f3ec4
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:165:17
56: 0x5597348a536e - main
57: 0x7f6bdeeea088 - __libc_start_call_main
58: 0x7f6bdeeea14b - __libc_start_main_impl
59: 0x559734893c85 - _start
60: 0x0 - <unknown>
Somehow, something internal to wgpu doesn't have an ID. I temporarily added #derive(Debug) to my own structures and debug logged the buffers I'm passing to copy_buffer_to_buffer and they all had IDs, so I'm not sure what exactly is going on, but looking a bit higher up the stack, it looks like it's related to the automatic GPU resource cleanup logic in 0.20.0 though I don't understand why it would be triggered.
Repro steps
I put a trimmed version of the code in a gist you just need to copy the main.rs file to a src/main.rs in a normal Rust project to test it.
Expected vs observed behavior
This code, (with minor modifications to remove the compilation_options field from the ComputePipelineDescriptor, compiles and runs successfully on 0.19.4, but crashes on 0.20.0
Extra materials
I include the trace.zip it generated.
Platform
I've tested this on Fedora/x86-64 and Debian/RISC-V with the same results, only wgpu version 0.20.0 is affected.
I edited my local cargo cache to insert a debug log on the buffer that's being set to be freed that is crashing things, which you can see below (with some hand formatting for better legibility:
Buffer {
raw: <snatchable>,
device: Device {
adapter: "<Adapter>",
limits: Limits {
max_texture_dimension_1d: 16384,
max_texture_dimension_2d: 16384,
max_texture_dimension_3d: 2048,
max_texture_array_layers: 2048,
max_bind_groups: 8,
max_bindings_per_bind_group: 1000,
max_dynamic_uniform_buffers_per_pipeline_layout: 16,
max_dynamic_storage_buffers_per_pipeline_layout: 8,
max_sampled_textures_per_shader_stage: 8388606,
max_samplers_per_shader_stage: 8388606,
max_storage_buffers_per_shader_stage: 8388606,
max_storage_textures_per_shader_stage: 8388606,
max_uniform_buffers_per_shader_stage: 8388606,
max_uniform_buffer_binding_size: 2147483648,
max_storage_buffer_binding_size: 2147483648,
max_vertex_buffers: 16,
max_buffer_size: 2147483647,
max_vertex_attributes: 32,
max_vertex_buffer_array_stride: 2048,
min_uniform_buffer_offset_alignment: 32,
min_storage_buffer_offset_alignment: 32,
max_inter_stage_shader_components: 128,
max_color_attachments: 8,
max_color_attachment_bytes_per_sample: 32,
max_compute_workgroup_storage_size: 65536,
max_compute_invocations_per_workgroup: 1024,
max_compute_workgroup_size_x: 1024,
max_compute_workgroup_size_y: 1024,
max_compute_workgroup_size_z: 1024,
max_compute_workgroups_per_dimension: 65535,
min_subgroup_size: 64,
max_subgroup_size: 64,
max_push_constant_size: 256,
max_non_sampler_bindings: 4294967295
},
features: Features(DEPTH_CLIP_CONTROL | DEPTH32FLOAT_STENCIL8 | TEXTURE_COMPRESSION_BC | TIMESTAMP_QUERY | INDIRECT_FIRST_INSTANCE | SHADER_F16 | RG11B10UFLOAT_RENDERABLE | BGRA8UNORM_STORAGE | FLOAT32_FILTERABLE | TEXTURE_FORMAT_16BIT_NORM | TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES | PIPELINE_STATISTICS_QUERY | TIMESTAMP_QUERY_INSIDE_ENCODERS | TIMESTAMP_QUERY_INSIDE_PASSES | MAPPABLE_PRIMARY_BUFFERS | TEXTURE_BINDING_ARRAY | BUFFER_BINDING_ARRAY | STORAGE_RESOURCE_BINDING_ARRAY | SAMPLED_TEXTURE_AND_STORAGE_BUFFER_ARRAY_NON_UNIFORM_INDEXING | UNIFORM_BUFFER_AND_STORAGE_TEXTURE_ARRAY_NON_UNIFORM_INDEXING | PARTIALLY_BOUND_BINDING_ARRAY | MULTI_DRAW_INDIRECT | MULTI_DRAW_INDIRECT_COUNT | PUSH_CONSTANTS | ADDRESS_MODE_CLAMP_TO_ZERO | ADDRESS_MODE_CLAMP_TO_BORDER | POLYGON_MODE_LINE | POLYGON_MODE_POINT | CONSERVATIVE_RASTERIZATION | VERTEX_WRITABLE_STORAGE | CLEAR_TEXTURE | SPIRV_SHADER_PASSTHROUGH | MULTIVIEW | SHADER_UNUSED_VERTEX_OUTPUT | TEXTURE_FORMAT_NV12 | SHADER_F64 | SHADER_I16 | SHADER_PRIMITIVE_INDEX | DUAL_SOURCE_BLENDING | SHADER_INT64 | SUBGROUP | SUBGROUP_VERTEX | SUBGROUP_BARRIER),
downlevel: DownlevelCapabilities {
flags: DownlevelFlags(COMPUTE_SHADERS | FRAGMENT_WRITABLE_STORAGE | INDIRECT_EXECUTION | BASE_VERTEX | READ_ONLY_DEPTH_STENCIL | NON_POWER_OF_TWO_MIPMAPPED_TEXTURES | CUBE_ARRAY_TEXTURES | COMPARISON_SAMPLERS | INDEPENDENT_BLEND | VERTEX_STORAGE | ANISOTROPIC_FILTERING | FRAGMENT_STORAGE | MULTISAMPLED_SHADING | DEPTH_TEXTURE_AND_BUFFER_COPIES | WEBGPU_TEXTURE_FORMAT_SUPPORT | BUFFER_BINDINGS_NOT_16_BYTE_ALIGNED | UNRESTRICTED_INDEX_BUFFER | FULL_DRAW_INDEX_UINT32 | DEPTH_BIAS_CLAMP | VIEW_FORMATS | UNRESTRICTED_EXTERNAL_TEXTURE_COPIES | SURFACE_VIEW_FORMATS | NONBLOCKING_QUERY_RESOLVE | VERTEX_AND_INSTANCE_INDEX_RESPECTS_RESPECTIVE_FIRST_VALUE_IN_INDIRECT_DRAW),
limits: DownlevelLimits,
shader_model: Sm5
}
},
usage: BufferUsages(MAP_WRITE | COPY_SRC),
size: 16,
initialization_status: RwLock { data: InitTracker { uninitialized_ranges: [] } },
sync_mapped_writes: Mutex { data: None },
info: ResourceInfo {
id: None,
tracker_index: TrackerIndex(1),
tracker_indices: Some(SharedTrackerIndexAllocator { inner: Mutex { data: } }),
submission_index: 0,
label: "(wgpu internal) initializing unmappable buffer"
},
map_state: Mutex { data: Idle },
bind_groups: Mutex { data: [] }
}
I don't create a buffer with MAP_WRITE | COPY_SRC flags set, and the label "(wgpu internal)..." indicates this is probably something internal to the copy_buffer_to_buffer function. I still don't know how it has no ID, though.
So only one place creates a label with that name, the device_create_buffer in wgpu_core/src/device/global.rs
Some debug logging on the args there reveals:
desc: BufferDescriptor { label: None, size: 16, usage: BufferUsages(COPY_SRC | COPY_DST | STORAGE), mapped_at_creation: true }
The described buffer to create is supposedly the buffer I'm copying from, but by this point in the trace, that buffer should already exist.
But if I slap a seemingly useless MAP_WRITE onto that buffer, avoiding whatever this temporary buffer is, the code compiles and runs on 0.20.0.
So I think that's the end of my bug report for now, as I don't understand why this temporary buffer is needed when copying from this buffer, and I don't know why it's not getting a proper ID during creation, but I do have a workaround for the time being.
@dfellis: Just the context I'm aware of: We're in the middle of transitioning backend resources to being tracked only by Arc, rather than ID. There are, unfortunately, some places where we are still tracking by ID. When code that only keeps track of Arcs attempts to use APIs that use IDs, then the code has no choice but to panic, since we're definitely doing something we Shouldn't Do™.
I believe that the solution here is to progress in our migration of resource tracking code that uses Arcs instead of IDs.
CC @teoxoy, @jimblandy.
@ErichDonGubler understood. Do you know what the timeline is on that conversion?
I've realized that my hack to work around this won't cut it because it fails for the OpenGL backend since MAP_WRITE is only allowed to be paired with COPY_SRC. That it's even working at all on the Vulkan backend is probably itself a bug?
And with that, I probably have to hold off on upgrading until copy_buffer_to_buffer works without this failure, or I find a cross-backend workaround and leave a big TODO to try and move back to the normal API.
@dfellis: We don't currently have one, but if this conversion is blocking or regressing user code, there's a good chance we can justify prioritizing it!
I'll let others comment on further context here, since I don't have it. 😅
@ErichDonGubler got it! But in the meantime, I have finally realized what's actually causing the crash in copy_buffer_to_buffer and it's the tracing itself.
I turned on tracing when I couldn't get my code working on the RISC-V single board computer I bought to specifically try and catch bugs in my code from platform assumptions, and then started getting errors. (Hooray, purchase justified ;) )
In the meantime I figured out that the issue was the Vulkan driver on this SBC doesn't implement everything needed for wgpu so I added logic to scan all of the adapters and pick the first one that has true for is_webgpu_compliant, but I did that on a new branch off of my main, which had wgpu on 0.19.4 without tracing on, while the branch I was debugging on is 0.20.0 with tracing turned on.
With the apparent fix for 0.20.0 being to slap MAP_WRITE onto a buffer that it shouldn't be on, I started prepping that for actual merging by turning off tracing and tests continued to pass on my x86-64 machines, tried to run it on the RISC-V machine and I got the validation error that I'm configuring the buffer incorrectly.
Okay, I agree, so let's try and figure out how to replicate whatever copy_buffer_to_buffer is doing internally with a temporary MAP_WRITE buffer, so I created some extra temporary buffers and tried to insert them into the command queue, getting more errors that I'm doing things incorrectly when I was trying to write into the MAP_WRITE buffer so I could then use it to write out to another buffer, and then I just reverted all of the changes in that file and re-ran the failing test so I could get the stacktrace on my machine, and it just worked.
Tested it on the RISC-V SBC and it also worked there: the difference is just removing features = ["trace"] in the Cargo.toml file.
So now I would say my real bug report is that the trace feature is broken by this migration to Arc, because it looks like the trace output requires IDs? (See snippet from the trace below) And this breakage in trace then produces a super misleading rabbit hole to spend a couple of days on.
Submit(2, [
CopyBufferToBuffer(
src: Id(0, 1, Vulkan),
src_offset: 0,
dst: Id(1, 1, Vulkan),
dst_offset: 0,
size: 16,
),
]),
I think this should have been fixed by f2ea30772c5a7c6777aee0511dd9b7198eb61329 (https://github.com/gfx-rs/wgpu/pull/5871) (not part of any release yet). @dfellis could you confirm?
So unfortunately I am not 100% sure that things are fixed.
Here is a screenshot of me re-branching off of the commit where I was trying to move to 0.20.0 at that time:
(I ran the test executable a second time outside of cargo test so we can see the failure in src/resource.rs:121)
When I run the test on the commit you pointed at, I get a bare segfault and trace.ron file ends with the same value as I originally reported.
So that looks like it's definitely still broken. But when I run the same commit with tracing on but also with the is_webgpu_compliant check, it succeeds.
So, I think tracing is working, but I'm not entirely sure because when I reproduce the exact same path as before where I use the buggy Vulkan driver, it still has the same output in trace.ron as before, but this time with a Segfault instead of an unwrap exploding, so perhaps it's just coincidental that the buggy Vulkan drivers are blowing up just after the tracing was before, or perhaps the tracing is just blowing up in a new and more exciting way?
I'm not really sure how to make that distinction, unfortunately.
I think the initial issue was resolved, with f2ea30772c5a7c6777aee0511dd9b7198eb61329 there will be no more id unwrapping.
so perhaps it's just coincidental that the buggy Vulkan drivers are blowing up just after the tracing was before, or perhaps the tracing is just blowing up in a new and more exciting way?
Since we now trace as soon as possible the segfault is probably in:
https://github.com/gfx-rs/wgpu/blob/f2ea30772c5a7c6777aee0511dd9b7198eb61329/wgpu-core/src/resource.rs#L478
Could you debug the segfault to see what's causing it?
This has been fixed by f2ea30772c5a7c6777aee0511dd9b7198eb61329 (https://github.com/gfx-rs/wgpu/pull/5871). Please open a new issue for the segfault if it turns out to be due to an issue in our implementation.
Hey, took a while to get back on this because my nvme drive on the machine died mid-debugging and I had to debug that first.
Anyways, it looks like it is blowing up inside of the VK driver so feel free to keep this closed.
No worries! Are you having issues with that GPU/driver in other apps as well? We could still be at fault for segfaults inside drivers if we use the API improperly.
No worries! Are you having issues with that GPU/driver in other apps as well? We could still be at fault for segfaults inside drivers if we use the API improperly.
So how I resolved my issues on this test machine last month was to iterate through all of the adapters and filter out any adapter where is_webgpu_compliant returns false. I just didn't expect the listing of adapters to include non-compliant drivers by default, so I don't think the crash when using a non-compliant driver is "your fault" but I might want the adapters list to pre-filter by default and you have to manually opt-in for the non-compliant drivers where the developer really has to know what they're doing and know what wgpu is doing under the hood to use it safely.
The GPU works fine with the OpenGL drivers for my use case, and after digging into things, I don't think any software on the machine uses the Vulkan drivers. (The GUI is GNOME, it's running Wayland with the WM being Mutter, and Mutter uses OpenGL, not Vulkan. It's a RISC-V machine so I can't pull up Steam and try to run some games on it to test on that front via Proton.)
Here's the about and screenfetch for the machine. Probably doesn't bring anything to the table, but just in case:
After I post this, I'll try running these vulkan test applications that I just found (afterwards just in case they completely crash this machine) and I'll let you know the results.
Hmm... Nevermind on that. The instructions for building the example applications don't work because the applications all require an add_shader_library custom cmake function that isn't defined, and after digging in a bit, it seems that's part of their Android testing and building for a "normal" Linux is not really working anymore. I'll see if I can find anything else to test Vulkan with.
I just installed some demo Vulkan apps and they're all failing because the driver doesn't have the "VK_KHR_swapchain" extension.
The only thing I can find online about that being missing is that it needs to enabled at device instantiation time and I presume these example applications would "know" to do that?
So my suspicion that this device's Vulkan driver is simply broken seems more likely. It's running a weird fork of Debian provided by the SBC manufacturer, so it's likely an issue with the drivers they got from PowerVR or how they packaged them, so if I want to pursue this further, I should reach out to them, but as I said I am fine with the OpenGL backend for my needs.
So how I resolved my issues on this test machine last month was to iterate through all of the adapters and filter out any adapter where
is_webgpu_compliantreturnsfalse. I just didn't expect the listing of adapters to include non-compliant drivers by default, so I don't think the crash when using a non-compliant driver is "your fault" but I might want the adapters list to pre-filter by default and you have to manually opt-in for the non-compliant drivers where the developer really has to know what they're doing and know whatwgpuis doing under the hood to use it safely.
We do filter non-compliant Vulkan drivers out by default but not non-compliant WebGPU drivers since we have DownlevelFlags which tell you what functionality is missing that makes the device not WebGPU compliant. This is so that we have a wider reach, maybe we should reconsider this being the default but it's not something users have requested yet AFAIK.
The only thing I can find online about that being missing is that it needs to enabled at device instantiation time and I presume these example applications would "know" to do that?
They should, we enable it for example.
So my suspicion that this device's Vulkan driver is simply broken seems more likely. It's running a weird fork of Debian provided by the SBC manufacturer, so it's likely an issue with the drivers they got from PowerVR or how they packaged them, so if I want to pursue this further, I should reach out to them, but as I said I am fine with the OpenGL backend for my needs.
It does seem like something is misconfigured.