dom0 Kernel latest (6.16+) doesnt start - amdxdna expects iommu handling in dom0
Qubes OS release
4.3rc2
Brief summary
With kernel latest installed it doesnt reach the point asking me for the LUKS encyrption passphrase.
Steps to reproduce
install kernel latest
Expected behavior
being asked for disc encryption passphrase
Actual behavior
kernel stops at 7 seconds: amdxdna: probe with driver amdxdna failed with error -5
followed by amdgpu complaining not finding optional firmware (amdgpu/isp_4_1_0.bin) at second 10.
then it stops for minutes.
https://c.ymy.be/s/zGEeKEoiZANdFw9 (ipv6 required)
Additional information
on 6.15.11-1 it still works:
...
[ 7.840194] hid-generic 0018:093A:0255.0001: input,hidraw0: I2C HID v1.00 Mouse [UNIW0001:00 093A:0255] on i2c-UNIW0001:00
[ 7.863081] ACPI: video: Video Device [VGA] (multi-head: yes rom: no post: no)
[ 7.864326] amdxdna 0000:67:00.1: enabling device (0000 -> 0002)
[ 7.867003] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:1a/LNXVIDEO:00/input/input6
[ 7.867631] xen: registering gsi 43 triggering 0 polarity 1
[ 7.870125] xen: --> pirq=43 -> irq=43 (gsi=43)
[ 7.872792] amdxdna 0000:67:00.1: [drm] *ERROR* aie2_init: Enable PASID failed, ret -19
[ 7.876202] amdxdna 0000:67:00.1: [drm] *ERROR* amdxdna_probe: Hardware init failed, ret -19
[ 7.882079] sdhci: Secure Digital Host Controller Interface driver
...
Device is: AMD Ryzen AI 9 HX 370 w/ Radeon 890M
I also confirm there is something wrong with 6.16.8-1. I also get errors about amdxdna but it lets me enter the LUKS password and then hangs on the 3 dots screen. Booting with nomodeset and qubes.skip_autostart and then Alt+F2 let me login to Dom0 so I imagine this is an amdgpu driver issue. Reverting to 6.15.11-1 boots normally. I'm on a laptop with an 8xxxHS series processor.
i m not sure if its amdgpu, the amdxdna or something else entirely
today i installled
amd-gpu-firmware.noarch 1:20251011-1.fc41 qubes-dom0-cached
amd-ucode-firmware.noarch 1:20251011-1.fc41 qubes-dom0-cached
linux-firmware-whence.noarch 1:20251011-1.fc41 qubes-dom0-cached
no improvement.
here is dmesg dom0 from the working 6.15.11 2025-10-12_61511.txt
i dont know how i could get dmesg from the 6..16.8 kernel its not listed in journalctl --list-boot :/
tested 6.17.4 from today, no success.
here are 2 screenpictures (i couldnt focus the whole screen... ) :
Upstream bug report: https://gitlab.freedesktop.org/drm/amd/-/issues/4656
See new comments in the above issue. I can also prepare kernel built with requested commit reverted.
You can get patched kernel via unstable repo:
sudo qubes-dom0-update --enablerepo=qubes-dom0-unstable --action=update kernel-latest
You should get version 6.17.4-1.qubes.1.fc41 (note the 1 after qubes)
thanks i will try to reboot in lunch break :)
( also i will try to set the grub boot things from upstream issue comments )
booted the qubes.1 kernel, same result. it gets stuck at the following screen:
in a few hours then the next try with the mentioned dcdebugmask i guess i put them into grub.
strong indicator for me that the suspicion of a graphics issue could be true: the external monitors dont switch on.
back with new dmesg xl-dmesg and beautiful screenpictures (:
nomodeset:
2025-10-24_1718_nomodeset.xen.txt 2025-10-24_1718_nomodeset.txt
it did not boot on dcdebugmask
0x400 i didnt recognise a difference so i didnt take a picture.
0x800:
0x10:
I see you managed to capture errors from amdgpu driver, yes, that is likely very helpful
a framework user seems to run into the same issue.
https://forum.qubes-os.org/t/hcl-framework-13-2025-ryzen-ai-300/34846/2
Just tried installing QubesOS R4.3.0-rc3-x86_64 using the latest kernel option. Including the two optional components for the desktop environment.
What works:
- Setup boots and installs to disk all as it should.
- Boot password gets asked.
What doesn't:
- I get a black screen with a single "_" at the top left corner after entering the disk encryption password. No login screen is reached.
When switching to:
- tty1 all I see is
[ 2.789723] dracut-cmdline[829]: Warning: USB in dom0 is not restricted. Consider rd.qubes.hide_all_usb or usbcore.authorized_default=0 - tty2 to 6 and tty8 to 12: A blinking "_" in the top left corner.
- tty7: Entirely black, not even the blinking cursor.
@agowa what hardware? The behavior you describe (black screen after disk password, instead of freeze before) suggests a different issue.
CPU: 2x AMD EPYC GPU: Intel A380 (+ a small onboard one for the iKVM) Motherboard: SuperMicro H11DSi Installed onto one of two NVMe (with ArchLinux installed on the other one).
So, significantly different platform (Intel GPU instead of AMD, EPYC instead of Ryzen). That's a different issue. Anyway, check if the system isn't simply using different output - try connecting monitor to a different port (or check if it's visible on iKVM). If still nothing, open a separate issue, or maybe even better ask on https://forum.qubes-os.org.
Sorry, my bad then. Misread and thought it was AMD CPU and not GPU related...
And no, it's not on a different output (sadly). I already checked that.
i noticed that a reseller of the tongfang laptop model i have issues a warning to not upgrade the GPU drivers:
https://go.xmg.gg/xmg-evo-e25-amd-driver-notice_en
maybe this does not just apply only to windows but also the linux kernel?
november firmware no change
addition: booted a cachyos 6.17.8 kernel (and wayland) bare metal successfully
lmk if i should provide dmesg from that.
finally got qubes builder to run - and it breaks apparently with merges from 6.15.11 to 6.16 (.0)
now i try to figure out this bisecting thing...
6.18 on bare metal seems to work. (see earlier remarks on cachyos)
It will be most effective if you build directly from Linux git clone, skipping qubes builder... You'll need either a trusted VM, or building it in dom0. The former is more correct approach, the latter is slightly easier. Generally, the approach would be:
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git(it will take a few GB of space)git verify-tag v6.16(you'll need to import keys first - you have them inbuilderv2/artifacts/sources/linux-kernel/*.ascalready)- Take kernel config from /boot/config-* and copy to
.configin kernel sources. git bisect start v6.16 v6.15.11
And then test the version you get this way, after which you select git bisect good or git bisect bad to report the result and get new version to test.
The build itself would be something like this (make a short script, it will be easier this way):
set -xe
make olddefconfig
make -j$(nproc) WERROR=0
make modules_install install
The above would work in dom0. If building in VM, replace the last line with:
rm -rf out
mkdir -p out
make INSTALL_MOD_PATH=$PWD/out INSTALL_PATH=$PWD/out INSTALLKERNEL=/bin/true modules_install install
And then copy out dir to dom0 to appropriate places (/lib/modules and /boot according as in the out dir), and then call dracut -f --kver $NEW_KERNEL_VERSION && grub2-mkconfig -o /boot/grub2/grub.cfg (where $NEW_KERNEL_VERSION is the version as in the out/lib/modules/ dir name).
Note you'll need to manually remove old (test) kernels after this whole operation...
At each iteration, make sure you boot the kernel you just built, not just the newest one - might be easier if you remove the one from previous attempt from /boot before copying new one in.
ah thank you. that makes thinks easier, i will try that soon, this weekend i did 2 iterations of bad, each took me several hours :)
note/edit for future generations: use the config file from dom0.... its much faster than the whole kernel for a generic domU... now compile runs take 20 minutes or so not 2 hrs....
Oh well, i guess its IOMMU again?
[user@kernel linux]$ git bisect good
7c8896dd4a2a27c84b04dcf0990e6f6b118cb6b2 is the first bad commit
commit 7c8896dd4a2a27c84b04dcf0990e6f6b118cb6b2
Author: Jason Gunthorpe <[email protected]>
Date: Fri Apr 18 16:01:24 2025 +0800
iommu: Remove IOMMU_DEV_FEAT_SVA
None of the drivers implement anything here anymore, remove the dead code.
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Tested-by: Zhangfei Gao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
drivers/accel/amdxdna/aie2_pci.c | 13 ++-----------
drivers/dma/idxd/init.c | 8 +-------
drivers/iommu/amd/iommu.c | 2 --
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 --
drivers/iommu/intel/iommu.c | 6 ------
drivers/iommu/iommu-sva.c | 3 ---
drivers/misc/uacce/uacce.c | 9 ---------
include/linux/iommu.h | 9 +--------
8 files changed, 4 insertions(+), 48 deletions(-)
[user@kernel linux]$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# bad: [038d61fd642278bab63ee8ef722c50d10ab01e8f] Linux 6.16
git bisect bad 038d61fd642278bab63ee8ef722c50d10ab01e8f
# status: waiting for good commit(s), bad commit known
# good: [0ff41df1cb268fc69e703a08a57ee14ae967d0ca] Linux 6.15
git bisect good 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
# good: [43db1111073049220381944af4a3b8a5400eda71] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 43db1111073049220381944af4a3b8a5400eda71
# bad: [11fcf368506d347088e613edf6cd2604d70c454f] uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again
git bisect bad 11fcf368506d347088e613edf6cd2604d70c454f
# bad: [ec71f661a572a770d7c861cd52a50cbbb0e1a8d1] Merge tag 'soc-dt-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad ec71f661a572a770d7c861cd52a50cbbb0e1a8d1
# bad: [9d49da438819c5dd82840eb63d929edbdccb80d8] Revert "iommu: make inclusion of arm/arm-smmu-v3 directory conditional"
git bisect bad 9d49da438819c5dd82840eb63d929edbdccb80d8
# good: [d8441523f21375b11a4593a2d89942b407bcb44f] Merge tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
git bisect good d8441523f21375b11a4593a2d89942b407bcb44f
# good: [eafd95ea74846eda3e3eac6b2bb7f34619d8a6f8] Merge tag 'pinctrl-v6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect good eafd95ea74846eda3e3eac6b2bb7f34619d8a6f8
# good: [dd91b5e1d6448794c07378d1be12e3261c8769e7] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
git bisect good dd91b5e1d6448794c07378d1be12e3261c8769e7
# bad: [879b141b7cfa09763f932f15f19e9bc0bcb020d5] Merge branches 'fixes', 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'fsl/pamu', 'mediatek', 'renesas/ipmmu', 's390', 'intel/vt-d', 'amd/amd-vi' and 'core' into next
git bisect bad 879b141b7cfa09763f932f15f19e9bc0bcb020d5
# bad: [21c03574df19f0d77cb2e4d28bc02c79b21e656a] iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU
git bisect bad 21c03574df19f0d77cb2e4d28bc02c79b21e656a
# good: [d50aaa4a9ffb0149d2187dfe3477300561f06fec] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
git bisect good d50aaa4a9ffb0149d2187dfe3477300561f06fec
# bad: [17fce9d2336d952b95474248303e5e7d9777f2e0] iommu/vt-d: Put iopf enablement in domain attach path
git bisect bad 17fce9d2336d952b95474248303e5e7d9777f2e0
# good: [249d3327f0236302a92d9eccb2b32f64c8daaf86] iommu/vtd: Remove iommu_alloc_pages_node()
git bisect good 249d3327f0236302a92d9eccb2b32f64c8daaf86
# good: [0da188c8468d8fe544d0aa2a5f610c78b8d34819] iommu: Split out and tidy up Arm Kconfig
git bisect good 0da188c8468d8fe544d0aa2a5f610c78b8d34819
# bad: [7c8896dd4a2a27c84b04dcf0990e6f6b118cb6b2] iommu: Remove IOMMU_DEV_FEAT_SVA
git bisect bad 7c8896dd4a2a27c84b04dcf0990e6f6b118cb6b2
# good: [cfea71aea921311350aabd7d5fc92269a052410e] iommu/arm-smmu-v3: Put iopf enablement in the domain attach path
git bisect good cfea71aea921311350aabd7d5fc92269a052410e
# first bad commit: [7c8896dd4a2a27c84b04dcf0990e6f6b118cb6b2] iommu: Remove IOMMU_DEV_FEAT_SVA
i entred good when the boot process exceeded the stuck boot process and the graphics changed and bad only when it stuck the exact same way why i came here in the first place. it was my first bisect so maaaybe i am mistaken here.
Hm, interesting, theoretically it shouldn't matter, as dom0 is not managing IOMMU (Xen is). But maybe there is some side effect. Normally I'd propose to test v6.16 with that commit reverted to be sure, but it doesn't revert cleanly...
maybe just the 2 lines in den amd/iommu.c ?
Posted some finding on the gitlab side. I have an idea: try to blacklist the whole amdxdna module - add to your kernel cmdline module_blacklist=amdxdna
module_blacklist=amdxdna worked for my laptop on 6.17.9-1 (didn't try others). System booted normally and with display =-)
On Mon, Dec 8, 2025, 6:23 PM Marek Marczykowski-Górecki < @.***> wrote:
marmarek left a comment (QubesOS/qubes-issues#10275) https://github.com/QubesOS/qubes-issues/issues/10275#issuecomment-3629623170
Posted some finding on the gitlab side. I have an idea: try to blacklist the whole amdxdna module - add to your kernel cmdline module_blacklist=amdxdna
— Reply to this email directly, view it on GitHub https://github.com/QubesOS/qubes-issues/issues/10275#issuecomment-3629623170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDBXKE44BZNCBBCALGWQXT4AYJAZAVCNFSM6AAAAACHVFVGLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMRZGYZDGMJXGA . You are receiving this because you commented.Message ID: @.***>
yes blocking the amdxdna modules worked (will put this in etc/default as i personally dont want amdxdna especially in qubes :) )
should i provide xl dmesg, linux dmesg with blocked amdxdna?