qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

After updating dom0 kernel on i7-9700 system, disk decryption screen appears black or has no signal

Open rsta79 opened this issue 9 months ago • 7 comments

Qubes OS release

Qubes OS 4.2

Brief summary

After updating the dom0 kernel (to 6.12.18-1.qubes.fc37.x86_64), the screen appears either completely black or no signal in disk decryption phrase. This issue can be resolved by fallback to boot with an older kernel (6.6.77-1.qubes.fc37.x86_64) entry, suggesting a potential i7-9700 specific regression bugs in dom0.

Actions Taken:

  • ESC
  • ctrl + alt + F1~F9
  • power cycle entire machine.
  • boot with fallback entry, and check updates for dom0.
  • type the disk password to the black screen and hit Enter

Unfortunately, none of these actions resolved the issue.

I have two Qubes setups:

  • i7-14700K: Functions correctly without any issues.
  • i7-9700: Experiencing the described problem.

Related Issue:

This may be a duplicate of qubes-issues#9737.

Steps to reproduce

  1. Updates to Linux 6.12.18-1.qubes.fc37.x86_64
  2. Press the power button
  3. Select default boot entry

Expected behavior

Display a screen asking for disk password.

Actual behavior

Screen appears either completely black or no signal

Additional information

CPU: Intel® Core™ i7-9700 Processor GPU: Intel® UHD Graphics 630 (iGPU)

Affected kernel version: Linux 6.12.18-1.qubes.fc37.x86_64 Last working kernel version: Linux 6.6.77-1.qubes.fc37.x86_64

rsta79 avatar Apr 08 '25 09:04 rsta79

Update: The latest version 6.12.21-1.qubes.fc37.x86_64 is still affected on i7-9700 system (i7-14700K setup work just fine)

It's incredibly hard to diagnose what causes this bug, as the disk is locked, and there's no log to pull out, or possibly the kernel isn't even running at all.

rsta79 avatar Apr 22 '25 10:04 rsta79

I will trying to reinstall Qubes from iso, and see if it works.

rsta79 avatar Apr 22 '25 10:04 rsta79

Btw, have you tried clicking esc, or ctrl + alt + f2 (or f3, f4, etc) to switch TTYs? I've seen a number of even Debian systems have this issue at the LUKS decryption screen

jmynes avatar Apr 25 '25 16:04 jmynes

Btw, have you tried clicking esc, or ctrl + alt + f2 (or f3, f4, etc) to switch TTYs? I've seen a number of even Debian systems have this issue at the LUKS decryption screen

I did, and I can confirmed that this issue is related to kernel regression bug. I'm not certain whether it's due to Linux patches or Qubes patches. I'm currently in the process of diagnosing which commit caused the break. So far, I can confirm that the issue has been present since 60b2d69 (6.9.7).

rsta79 avatar Apr 26 '25 06:04 rsta79

Update:

It appears to be an upstream kernel bug, as this problem also occurs in non-Qubes environments. Specifically, it seems to be a regression with the i915 driver between kernel versions 6.9 and 6.8.12, with 6.8.12 being the last stable version that worked correctly.

While the kernel itself is functioning, the graphics are not. On DisplayPort, I experience a black screen or intermittent "no signal" messages, and on HDMI, I see a flashing rainbow screen.

I am currently investigating which specific commit introduced this issue, but it is likely related to the i915 driver, as mentioned earlier.

rsta79 avatar Apr 30 '25 15:04 rsta79

This seems to be a regression introduced a year ago, between kernel versions 6.9 and 6.8.12, since kernel-stable#480e035, the exact commit that introduced the problem.

I have reported this bug to the i915 developers at freedesktop.org. You can find it here: drm/i915#14213

rsta79 avatar May 02 '25 12:05 rsta79

CC from drm/i915#14213 I believe I misdiagnosed the problem; the commit kernel-stable#7627a0edef54 is actually the one to blame.

CC from https://bugzilla.kernel.org/show_bug.cgi?id=220111 This regression was introduced in commit "7627a0edef54." It appears that ahci.c is setting non-host controller devices (in this case, Intel VGA) to ATA_LPM_MIN_POWER, which causes buffer underruns in the Intel iGPU on i7-9700 systems, rendering the display unusable in most cases (129 out of 132 boots). Currently, this can be fixed with the kernel parameter ahci.mobile_lpm_policy=0 as a workaround.

@marmarek, Should we create a patch for the workaround? I can do that if needed, but I'm not sure if it's worth it since I have no idea how widely this regression is impacting QubesOS users ... or should we wait for upstream fix it then consider creating a backport patch of that?"

rsta79 avatar May 12 '25 22:05 rsta79

@andrewdavidwong

I believe we can close this now, as it has been addressed by the upstream libata commit libata/3e0809b1. Once this is merged into the Linux stable branch and qubes rebase to the patched version, the issue will be resolved automatically.

rsta79 avatar Jun 26 '25 00:06 rsta79

Closing as completed. If anyone believes this issue is not yet completed, or if anyone is still affected by this issue, please leave a comment saying so, and we'll be happy to reopen it. Thank you.

andrewdavidwong avatar Jun 26 '25 02:06 andrewdavidwong