[Linux] [Mesa RADV] Segfault when launching a vulkan app with validation layers after latest mesa 25.2.7-2 and linux/amdgpu-firmware 20251111-1 updates
Environment:
- OS: Fedora 43
- GPU and driver version: AMD Radeon 780M Graphics (RADV PHOENIX), Mesa radv 25.2.7
- SDK or header version if building from repo: Vulkan SDK 1.4.328.1
- Options enabled (synchronization, best practices, etc.): None. Validation layers enabled but all the options disabled
- Kernel version: Linux fedora 6.17.8-300
Describe the Issue I am developing a vulkan renderer and, after the latest mesa-vulkan-drivers-0:25.2.7-2.fc43, linux-firmware-0:20251111-1.fc43 and amd-gpu-firmware-0:20251111-1.fc43 updates it is instantly crashing, even after I rebuild it from the ground up. The crash happens when calling the InstanceBuilder::build() function from the Vk-Bootstrap library, which turns to be the first Vulkan call of my program. If I disable the validation layers the program works alright. Furthermore, I can confirm that in mesa 25.2.6-1 and firmware 20251021-1 it is working fine. I attach below the system logs from the crash.
Expected behavior The program should launch.
Valid Usage ID I encounter a sigsev: "terminated by signal SIGSEGV (Address boundary error)".
Additional context
By the way, I reported the issue in Fedora's bugzilla and in mesa's repository before ending up here. These are the links: https://bugzilla.redhat.com/show_bug.cgi?id=2415839 https://gitlab.freedesktop.org/mesa/mesa/-/issues/14322#note_3199622
@Txordi00 how did you enable/disable the Validation layers, can you run with export VK_LOADER_DEBUG=layer and print the layer chain you have... this smells like a case of a bad layer inbetween
Also is this with the SDK version of VVL or your own? Curious if the issue presists on both ways
also for the record, I develop on a System76 RADV laptop running Fedora 42 right now... can try to upgrade my OS (finally) to 43 and see if I can reproduce
update - I got to Fedora 43, can't reproduce with any Vulkan app with SDK or latest Validation Layers
@Txordi00 how did you enable/disable the Validation layers, can you run with
export VK_LOADER_DEBUG=layerand print the layer chain you have... this smells like a case of a bad layer inbetween
I enable and disable the validation layers from vkconfig-gui. Like this it works:
The output with the VK_LOADER_DEBUG=layer env:
Also is this with the SDK version of VVL or your own? Curious if the issue presists on both ways
also for the record, I develop on a System76 RADV laptop running Fedora 42 right now... can try to upgrade my OS (finally) to 43 and see if I can reproduce
update - I got to Fedora 43, can't reproduce with any Vulkan app with SDK or latest Validation Layers
I use the latest SDK downloaded from https://vulkan.lunarg.com/. So, it's not my own build. Does it take long or a lot of packages to compile btw?
In Fedora 43 I have the updates-testing repo enabled. These latest updates are from there.
Can you run the vkcube in your SDK, on the Mesa issue you said "run any Vulkan app"... is the issue there when you run any Vulkan application, or just yours? (difference if this is something system wide or the way your app is hooking into the layers)
Does it take long or a lot of packages to compile btw?
It is pretty simple
git clone https://github.com/KhronosGroup/Vulkan-ValidationLayers.git
cd Vulkan-ValidationLayers
cmake -S . -B build -D UPDATE_DEPS=ON -D BUILD_TESTS=OFF -D CMAKE_BUILD_TYPE=Debug
cmake --build build
Can you run the
vkcubein your SDK, on the Mesa issue you said "run any Vulkan app"... is the issue there when you run any Vulkan application, or just yours? (difference if this is something system wide or the way your app is hooking into the layers)Does it take long or a lot of packages to compile btw?
It is pretty simple
git clone https://github.com/KhronosGroup/Vulkan-ValidationLayers.git cd Vulkan-ValidationLayers cmake -S . -B build -D UPDATE_DEPS=ON -D BUILD_TESTS=OFF -D CMAKE_BUILD_TYPE=Debug cmake --build build
My bad, it said "a Vulkan app", but I corrected it to "my Vulkan renderer". Sorry for ringing all the alarms...
I tried to run vkcube, and it runs just fine. Seems that the issue is specific to my app... Should I build the SDK then? Thanks for the help.
I tried to run vkcube, and it runs just fine. Seems that the issue is specific to my app
This is good info to know! So one last thing to try is going export VK_LOADER_DEBUG=error,warn and see if the Loader is trying to report there is something wrong
Should I build the SDK then?
Before that, is there a way you could share a binary of your renderer or can I build it somewhere (if its something you don't want to share publicly, email me at spencer@lunarg.com and I'll send you a link to an internal FTP website you can share things with me only
This is good info to know! So one last thing to try is going
export VK_LOADER_DEBUG=error,warnand see if the Loader is trying to report there is something wrong
I only get the following:
[WARNING: General]
terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib64/libvulkan_dzn.so. Skipping this driver.
[Vulkan Loader] WARNING: terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib64/libvulkan_dzn.so. Skipping this driver.
fish: Job 1, './lrt' terminated by signal SIGSEGV (Address boundary error)
Before that, is there a way you could share a binary of your renderer or can I build it somewhere (if its something you don't want to share publicly, email me at spencer
@lunarg.com and I'll send you a link to an internal FTP website you can share things with me only
Sent!
ok, so I was able to build and I can see the chess board rendering
So if you turn off vkconfig, the request_validation_layers() line in the bootstrap code finds the validation layers for me automatically and loads them
I tired both as is, and with the line removed and opened vkconfig and ran
I was able to successfully load things
... this just smells like a classic "system env setup issue".. and looking at your last error
[Vulkan Loader] WARNING: terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib64/libvulkan_dzn.so. Skipping this driver.
I actually have zero idea why use have libvulkan_dzn.so, this is the driver to map Vulkan onto D3D for Windows
could you send me (here or email) a VK_LOADER_DEBUG=all ./lrt and let me look
@charles-lunarg need second eyes here
- User has crash that seems to be happening in
vkEnumeratePhysicalDevicesinside VVL (but couldn't reproduce locally) - Looking at the above comment, you spot anything strange
- The crash dump shows thread saftey, is there some known threading issue going on here? (granted it crashes with all the settings turned off too)
- There are a lot drivers found, not sure if that has anything to do with it
Hi,
I'm not convinced the issue lies with the validation layers, I ran into a similar issue after updating my fedora 43 system today.
-
vkcube=> works, no layers are enabled due to an error in mesa_device_select, see normal.log:Failed to find vkGetDeviceProcAddr in layer "libVkLayer_MESA_device_select.so" -
vkcube --validate=> segfaults, both layers are enabled, see with_validation.log -
NODEVICE_SELECT=1 vkcube --validate=> works, only the validation layer is enabled, see with_validation_without_devselect.log
Here's my vulkaninfo.log, and I'm using vulkan-validation-layers-1.4.321.0-3.fc43.x86_64, and mesa-vulkan-drivers-25.2.7-2.fc43.x86_64
I am also facing same issue on Fedora 43. vkcube and my application works only with validation layers disabled. Using vulkan-validation-layers-1.4.321.0-3.fc43.x86_64 and mesa-vulkan-drivers-25.2.7-2.fc43.x86_64. Here is my gdb backtrace log
vulkan_gdb_log.txt
Well libvulkan_dzn.so warning was occuring in earlier versions as well. vkCreateInstance crash only happened after updating from mesa 25.2.6 --> 25.2.7
Yeah no change on MESA_device_select when diffing these mesa release tags, but as stated in https://bodhi.fedoraproject.org/updates/FEDORA-2025-82b66363b4 25.2.7 also includes this mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38252
Edit: The bug is likely due to the removal of this line: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38252#note_3205037
same problem in fedora 43, gdb gives different stack when I run my application so I think it is some multi thread issue I remembered I got an StartRead() crash sometimes as the early one say.
0x00007fffdd85f236 in threadsafety::Counter<VkInstance_T*>::FindObject(VkInstance_T*, Location const&) () from /lib64/libVkLayer_khronos_validation.so (gdb) bt full #0 0x00007fffdd85f236 in threadsafety::Counter<VkInstance_T*>::FindObject(VkInstance_T*, Location const&) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #1 0x00007fffdd85e8d3 in threadsafety::Counter<VkInstance_T*>::StartRead(VkInstance_T*, Location const&) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #2 0x00007fffdd820ee3 in threadsafety::Instance::PreCallRecordEnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**, RecordObject const&) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #3 0x00007fffdd5176c7 in vulkan_layer_chassis::EnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #4 0x00007fffdd5176df in vulkan_layer_chassis::EnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #5 0x00007fffdd5176df in vulkan_layer_chassis::EnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #6 0x00007fffdd5176df in vulkan_layer_chassis::EnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #7 0x00007fffdd5176df in vulkan_layer_chassis::EnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**) () from /lib64/libVkLayer_khronos_validation.so No symbol table info available. #8 0x00007fffdd5176df in vulkan_layer_chassis::EnumeratePhysicalDevices(VkInstance_T*, unsigned int*, VkPhysicalDevice_T**) () from /lib64/libVkLayer_khronos_validation.so
Ok, so grabbed the latest Mesa, this 100% seems like a
If you go NODEVICE_SELECT=1 , it will turn off the VK_LAYER_MESA_device_select layer and than VVL works correctly
Happy to keep this issue open, but seems the links above @Jiboo has is more on the right path here, we need to fix this in Mesa
For those who want to "power through it", use export NODEVICE_SELECT=1 temporarily
I can confirm that vkcube --validate crashes for me and that with NODEVICE_SELECT=1 both vkcube --validate and my own app do not crash anymore.
Final Update - https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38252#note_3206915
So the buggy device select layer is packaged on /lib64/libVkLayer_MESA_device_select.so, seems the bad MR was cherry picked in Fedora, we all grabbed it and soon it will be fixed
If you "need" a working device select layer now (and NODEVICE_SELECT=1 is not good enough) build mesa with the current above MR using -Dvulkan-layers=device-select and set your VK_LAYER_PATH to the install library and it will look like the following in VK_LOADER_DEBUG=layer
[Vulkan Loader] LAYER: VK_LAYER_MESA_device_select
[Vulkan Loader] LAYER: Type: Implicit
[Vulkan Loader] LAYER: Enabled By: Implicit Layer
[Vulkan Loader] LAYER: Disable Env Var: NODEVICE_SELECT
[Vulkan Loader] LAYER: Manifest: /usr/share/vulkan/implicit_layer.d/VkLayer_MESA_device_select.json
[Vulkan Loader] LAYER: Library: /home/fricke/install/mesa/lib64/libVkLayer_MESA_device_select.so
[Vulkan Loader] LAYER: ||
... or just wait, assume Dave will release the fix very soon and a dnf update will fix it soon
Thanks a lot @spencer-lunarg and to all the people involved in tracking & solving that MR!