ash
ash copied to clipboard
Policy for null pointers returned by drivers?
Following up to https://github.com/gfx-rs/wgpu/issues/2091#issuecomment-947295524
TL;DR: ExtDebugUtilsFn::cmd_begin_debug_utils_label_ext is NULL
Something needs to check for NULL. It seems to me that Ash is in a better position to do this.
One way to proceed would be to return an Error from DebugUtils::new if any of the pointers are NULL. This is unfortunate because it's an API change, and in order to be consistent with the rest all initialization would have to return a result.
Another way is to add something like fn new_fallible specifically to DebugUtils to let clients work around bad drivers.
What driver is this? Are you certain that it advertises support for the extension?
AdapterInfo { name: "Intel(R) UHD Graphics 630", vendor: 32902, device: 16027, device_type: IntegratedGpu, backend: Vulkan } Driver Version 24.20.100.628
I haven't confirmed it myself. cc @cwfitzgerald, who was investigating this on Discord.
Here is how the initialization in wgpu goes:
- we gather a list of extensions here - https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L122
- In debug, it includes
ext::DebugUtils::name(): https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L159 - at the end of the function, we filter out ones not supported by the instance
- so the result of this function only contains extensions that are needed and supported - https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L476
- finally, we initialize
DebugUtilsextension here - https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L206
Thanks, wgpu's behavior looks sane to me. Debug utils are often implemented by layers, so I wonder if there might be a janky layer involved.
Overall, I'm reluctant to support ash going out of its way to defend against things that are expressly forbidden by the spec. I think this should be reported upstream, provided a driver update doesn't fix the user's problem. A mitigation could be applied at a higher layer by allowing the user to explicitly disable the debug extension (perhaps this is already supported?) and/or by blacklisting it for the driver. If we add extra checking/workarounds at the ash API level every time we see a driver bug, we might end up with arbitrarily large amounts of cruft, with little ability to remove it.
If we can work around this on our side, that's an acceptable outcome. The function we call - cmd_begin_debug_utils_label is a wrapper though, it doesn't tell us in any way if it's not possible to call.
Reviewing the specific implementation of ExtDebugUtilsFn::load, it should be impossible for ExtDebugUtilsFn::cmd_begin_debug_utils_label_ext to become null, because ash already checks that and drops in a stub function if it is null.
https://github.com/MaikKlein/ash/blob/e10bbf3063d9b84b9d8c04e6e2baae7d4881cce4/ash/src/vk/extensions.rs#L9965-L9984
Are you certain that's what's happening? If so, can you trace where the pointer is becoming null?
I've also run into this issue when running wgpu's boids example (using Ubuntu 18.04 with an AMD GPU), after running
sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt update
sudo apt upgrade
sudo apt install libvulkan1 mesa-vulkan-drivers vulkan-utils
to install Vulkan drivers and then confirming that vulkan-smoketest works. The version of driver installed is mesa 20.0.8).
Using lldb, I found out that on a specific line in this code:
cmd_begin_debug_utils_label_ext: unsafe {
unsafe extern "system" fn cmd_begin_debug_utils_label_ext(
_command_buffer: CommandBuffer,
_p_label_info: *const DebugUtilsLabelEXT,
) {
panic!(concat!(
"Unable to load ",
stringify!(cmd_begin_debug_utils_label_ext)
))
}
let cname = ::std::ffi::CStr::from_bytes_with_nul_unchecked(
b"vkCmdBeginDebugUtilsLabelEXT\0",
);
let val = _f(cname);
// right here, specifically
if val.is_null() {
cmd_begin_debug_utils_label_ext
} else {
::std::mem::transmute(val)
}
}
val is not null. It instead points to a function whose disassembly is
libvulkan.so.1`___lldb_unnamed_symbol1120:
0x7ffff4f1aaa0 <+0>: movq (%rdi), %rax
0x7ffff4f1aaa3 <+3>: jmpq *0x640(%rax)
0x7ffff4f1aaa9 <+9>: nopl (%rax)
I found out that when the program reaches this code,
pub unsafe fn cmd_begin_debug_utils_label_ext(
&self,
command_buffer: CommandBuffer,
p_label_info: *const DebugUtilsLabelEXT,
) {
// right here
(self.cmd_begin_debug_utils_label_ext)(command_buffer, p_label_info)
}
the disassembly (and next instruction, marked by an arrow) is:
boids`cmd_begin_debug_utils_label_ext:
0x5555561bed90 <+0>: subq $0x28, %rsp
0x5555561bed94 <+4>: movq %rdx, (%rsp)
0x5555561bed98 <+8>: movq %rsi, %rax
0x5555561bed9b <+11>: movq (%rsp), %rsi
0x5555561bed9f <+15>: movq %rax, 0x8(%rsp)
0x5555561beda4 <+20>: movq %rdi, %rax
0x5555561beda7 <+23>: movq 0x8(%rsp), %rdi
0x5555561bedac <+28>: movq %rax, 0x10(%rsp)
0x5555561bedb1 <+33>: movq %rdi, 0x18(%rsp)
0x5555561bedb6 <+38>: movq %rsi, 0x20(%rsp)
-> 0x5555561bedbb <+43>: callq *0x28(%rax)
0x5555561bedbe <+46>: addq $0x28, %rsp
0x5555561bedc2 <+50>: retq
and the registers are:
rax = 0x0000555556ad2d20
rbx = 0x00007fffffffda80
rcx = 0x00007fffffff9830
rdx = 0x00007fffffff9830
rdi = 0x000055555a3c2eb0
rsi = 0x00007fffffff9830
rbp = 0x0000555556475de0 boids`__libc_csu_init
rsp = 0x00007fffffff97a0
r8 = 0x000000003b9cbe02
r9 = 0x0000000000000000
r10 = 0x0000555559eac070
r11 = 0x0000000000000000
r12 = 0x00005555556b8e50 boids`_start
r13 = 0x00007fffffffdb90
r14 = 0x00005555569e4b28
r15 = 0x0000000000000000
rip = 0x00005555561bedbb boids`cmd_begin_debug_utils_label_ext + 43 at extensions.rs:9584:9
rflags = 0x0000000000000206
cs = 0x0000000000000033
fs = 0x0000000000000000
gs = 0x0000000000000000
ss = 0x000000000000002b
ds = 0x0000000000000000
es = 0x0000000000000000
Reading from rax + 0x28 gives 0x00007ffff4f1aaa0, and disassembling from there gives the disassembly of libvulkan.so.1`___lldb_unnamed_symbol1120 again. Stepping at the instruction level, rax becomes 0x0000555559eb3c50, which is presumably the address to the command buffer. Dumping 0x650 bytes from there gives
0x555559eb3c50: a0 42 f2 f4 ff 7f 00 00 20 d2 78 f4 ff 7f 00 00 .B...... .x.....
0x555559eb3c60: 70 d4 78 f4 ff 7f 00 00 a0 d4 78 f4 ff 7f 00 00 p.x.......x.....
0x555559eb3c70: 10 d7 78 f4 ff 7f 00 00 a0 d7 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3c80: 80 db 78 f4 ff 7f 00 00 80 e1 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3c90: 90 e1 78 f4 ff 7f 00 00 10 e2 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3ca0: 40 e2 78 f4 ff 7f 00 00 50 e2 78 f4 ff 7f 00 00 @.x.....P.x.....
0x555559eb3cb0: 90 e4 78 f4 ff 7f 00 00 00 e5 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3cc0: 90 e5 78 f4 ff 7f 00 00 60 e2 78 f4 ff 7f 00 00 ..x.....`.x.....
0x555559eb3cd0: 60 e3 78 f4 ff 7f 00 00 f0 e3 78 f4 ff 7f 00 00 `.x.......x.....
0x555559eb3ce0: c0 e5 78 f4 ff 7f 00 00 f0 e7 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3cf0: d0 e9 78 f4 ff 7f 00 00 c0 ea 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3d00: 50 eb 78 f4 ff 7f 00 00 f0 eb 78 f4 ff 7f 00 00 P.x.......x.....
0x555559eb3d10: 70 f0 78 f4 ff 7f 00 00 d0 f2 78 f4 ff 7f 00 00 p.x.......x.....
0x555559eb3d20: 00 f6 78 f4 ff 7f 00 00 10 f7 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3d30: 70 f7 78 f4 ff 7f 00 00 90 f7 78 f4 ff 7f 00 00 p.x.......x.....
0x555559eb3d40: a0 f7 78 f4 ff 7f 00 00 50 38 81 f4 ff 7f 00 00 ..x.....P8......
0x555559eb3d50: 70 3a 81 f4 ff 7f 00 00 d0 3a 81 f4 ff 7f 00 00 p:.......:......
0x555559eb3d60: b0 f7 78 f4 ff 7f 00 00 40 f9 78 f4 ff 7f 00 00 [email protected].....
0x555559eb3d70: 80 9d 79 f4 ff 7f 00 00 30 9e 79 f4 ff 7f 00 00 ..y.....0.y.....
0x555559eb3d80: 10 98 79 f4 ff 7f 00 00 90 98 79 f4 ff 7f 00 00 ..y.......y.....
0x555559eb3d90: 10 99 79 f4 ff 7f 00 00 b0 9a 79 f4 ff 7f 00 00 ..y.......y.....
0x555559eb3da0: 60 9b 79 f4 ff 7f 00 00 00 ee 7f f4 ff 7f 00 00 `.y.............
0x555559eb3db0: d0 ee 7f f4 ff 7f 00 00 10 e9 7f f4 ff 7f 00 00 ................
0x555559eb3dc0: 20 ea 7f f4 ff 7f 00 00 70 ea 7f f4 ff 7f 00 00 .......p.......
0x555559eb3dd0: 00 ec 7f f4 ff 7f 00 00 80 d0 7f f4 ff 7f 00 00 ................
0x555559eb3de0: 10 d1 7f f4 ff 7f 00 00 80 7a 7f f4 ff 7f 00 00 .........z......
0x555559eb3df0: d0 25 79 f4 ff 7f 00 00 e0 27 79 f4 ff 7f 00 00 .%y......'y.....
0x555559eb3e00: 00 0c 79 f4 ff 7f 00 00 80 11 79 f4 ff 7f 00 00 ..y.......y.....
0x555559eb3e10: 40 1a 79 f4 ff 7f 00 00 10 22 79 f4 ff 7f 00 00 @.y......"y.....
0x555559eb3e20: 10 28 79 f4 ff 7f 00 00 70 2a 79 f4 ff 7f 00 00 .(y.....p*y.....
0x555559eb3e30: 20 2b 79 f4 ff 7f 00 00 10 2c 79 f4 ff 7f 00 00 +y......,y.....
0x555559eb3e40: a0 2b 79 f4 ff 7f 00 00 c0 37 79 f4 ff 7f 00 00 .+y......7y.....
0x555559eb3e50: 20 0a 79 f4 ff 7f 00 00 d0 0b 79 f4 ff 7f 00 00 .y.......y.....
0x555559eb3e60: a0 26 7f f4 ff 7f 00 00 b0 31 7f f4 ff 7f 00 00 .&.......1......
0x555559eb3e70: 20 32 7f f4 ff 7f 00 00 30 29 78 f4 ff 7f 00 00 2......0)x.....
0x555559eb3e80: 20 2a 78 f4 ff 7f 00 00 f0 2a 78 f4 ff 7f 00 00 *x......*x.....
0x555559eb3e90: 30 12 78 f4 ff 7f 00 00 a0 11 78 f4 ff 7f 00 00 0.x.......x.....
0x555559eb3ea0: b0 14 78 f4 ff 7f 00 00 20 1e 78 f4 ff 7f 00 00 ..x..... .x.....
0x555559eb3eb0: a0 14 78 f4 ff 7f 00 00 10 1f 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3ec0: 20 24 78 f4 ff 7f 00 00 80 24 78 f4 ff 7f 00 00 $x......$x.....
0x555559eb3ed0: e0 24 78 f4 ff 7f 00 00 00 25 78 f4 ff 7f 00 00 .$x......%x.....
0x555559eb3ee0: 50 25 78 f4 ff 7f 00 00 80 25 78 f4 ff 7f 00 00 P%x......%x.....
0x555559eb3ef0: b0 25 78 f4 ff 7f 00 00 00 26 78 f4 ff 7f 00 00 .%x......&x.....
0x555559eb3f00: 50 26 78 f4 ff 7f 00 00 b0 17 78 f4 ff 7f 00 00 P&x.......x.....
0x555559eb3f10: b0 16 78 f4 ff 7f 00 00 e0 15 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb3f20: a0 2b 78 f4 ff 7f 00 00 00 2c 78 f4 ff 7f 00 00 .+x......,x.....
0x555559eb3f30: 80 2c 78 f4 ff 7f 00 00 00 2d 78 f4 ff 7f 00 00 .,x......-x.....
0x555559eb3f40: 20 2f 78 f4 ff 7f 00 00 90 2f 78 f4 ff 7f 00 00 /x....../x.....
0x555559eb3f50: 10 f6 7a f4 ff 7f 00 00 b0 ba 7c f4 ff 7f 00 00 ..z.......|.....
0x555559eb3f60: 40 24 7a f4 ff 7f 00 00 90 ba 7c f4 ff 7f 00 00 @$z.......|.....
0x555559eb3f70: a0 ba 7c f4 ff 7f 00 00 90 f6 7a f4 ff 7f 00 00 ..|.......z.....
0x555559eb3f80: f0 f5 7a f4 ff 7f 00 00 00 a7 7c f4 ff 7f 00 00 ..z.......|.....
0x555559eb3f90: e0 a7 7c f4 ff 7f 00 00 90 a8 7c f4 ff 7f 00 00 ..|.......|.....
0x555559eb3fa0: a0 21 7d f4 ff 7f 00 00 c0 40 78 f4 ff 7f 00 00 .!}......@x.....
0x555559eb3fb0: d0 40 78 f4 ff 7f 00 00 e0 40 78 f4 ff 7f 00 00 .@x......@x.....
0x555559eb3fc0: 50 40 78 f4 ff 7f 00 00 90 44 81 f4 ff 7f 00 00 [email protected]......
0x555559eb3fd0: 70 45 81 f4 ff 7f 00 00 f0 42 81 f4 ff 7f 00 00 pE.......B......
0x555559eb3fe0: 80 45 81 f4 ff 7f 00 00 10 3f 81 f4 ff 7f 00 00 .E.......?......
0x555559eb3ff0: f0 1d 78 f4 ff 7f 00 00 80 39 78 f4 ff 7f 00 00 ..x......9x.....
0x555559eb4000: d0 3b 78 f4 ff 7f 00 00 30 3c 78 f4 ff 7f 00 00 .;x.....0<x.....
0x555559eb4010: 40 27 78 f4 ff 7f 00 00 a0 e4 78 f4 ff 7f 00 00 @'x.......x.....
0x555559eb4020: 30 e5 78 f4 ff 7f 00 00 80 15 79 f4 ff 7f 00 00 0.x.......y.....
0x555559eb4030: 50 41 78 f4 ff 7f 00 00 b0 2e 78 f4 ff 7f 00 00 PAx.......x.....
0x555559eb4040: 90 e3 78 f4 ff 7f 00 00 d0 e2 78 f4 ff 7f 00 00 ..x.......x.....
0x555559eb4050: 40 e4 78 f4 ff 7f 00 00 40 2b 78 f4 ff 7f 00 00 @.x.....@+x.....
0x555559eb4060: 40 d4 78 f4 ff 7f 00 00 40 3d 79 f4 ff 7f 00 00 @.x.....@=y.....
0x555559eb4070: 10 3e 79 f4 ff 7f 00 00 e0 37 79 f4 ff 7f 00 00 .>y......7y.....
0x555559eb4080: 10 3a 79 f4 ff 7f 00 00 30 3d 79 f4 ff 7f 00 00 .:y.....0=y.....
0x555559eb4090: 40 22 79 f4 ff 7f 00 00 90 6e f3 f4 ff 7f 00 00 @"y......n......
0x555559eb40a0: 70 4d 81 f4 ff 7f 00 00 80 4d 81 f4 ff 7f 00 00 pM.......M......
0x555559eb40b0: 60 4e 81 f4 ff 7f 00 00 d0 4e 81 f4 ff 7f 00 00 `N.......N......
0x555559eb40c0: f0 4e 81 f4 ff 7f 00 00 60 7d f3 f4 ff 7f 00 00 .N......`}......
0x555559eb40d0: 90 4d 81 f4 ff 7f 00 00 00 00 00 00 00 00 00 00 .M..............
0x555559eb40e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb40f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb41a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb41b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb41c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb41d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb41e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb41f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4260: 00 00 00 00 00 00 00 00 50 e0 f1 f4 ff 7f 00 00 ........P.......
0x555559eb4270: 90 e1 f1 f4 ff 7f 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x555559eb4290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
The pointer at rax + 0x640 is null. So it seems like something is up with the command buffer.
Running the boids example in release mode works fine.
Running the boids example in debug mode on Ubuntu 20.04 with mesa 22.1 also works fine.
@kvark has this "driver returns nullpointer" issue been resolved?
@josh65536 that seems to be a different issue, where the driver function is indeed non-NULL but the internal implementation/state is botched somehow; seems we've got a separate report of "something similar" in #700, but yet to be confirmed with more detailed dumps/stacktraces.
Closing this after not receiving any followup, there appears to be nothing actionable for ash.