qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

Hard freeze with R4.2 on Lenovo Thinkpad Carbon X1 8 Gen

Open uiskla opened this issue 6 months ago • 17 comments

Qubes OS release

R4.2

Brief summary

System freezes and is totally unresponsive after random time. This can happen when the system is started in the GUI or while booting up.

Steps to reproduce

I have updated a fully functional R4.1 installation in-place to R4.2. The freezes occur with Kernel 6.1.62 and Kernel 6.6.2.

I have also created an USB stick with R4.2 installer ISO. The installer also freezes at random times, sometimes when booting, sometimes in the Installer GUI.

Since the only difference to my previously working R4.1 installation seems to be a newer Xen version, I have installed the package xen-hypervisor-4.14.6-4.fc32.x86_64.rpm alongside xen-hypervisor-4.17.2. By manipulating the grub command line I have booted with this hypervisor. Of course, no Xen-related functionality is actually working with this hack, because I cannot install other xen packages (dependeny problems). But booted like this, the system does not seem to freeze (now running for two hours).

Additional info: A parallel Windows 11 installations works without problems. I have updated the system to the latest BIOS and firmware using Lenovo tools under Windows.

Expected behavior

The system does not freeze and has no stability problems like under R4.1 (and Windows).

Actual behavior

The system randomly freezes. This not only occurs in the GUI, but also on the console. "dmesg -w" does not give any errors until the "freeze point".

uiskla avatar Jan 04 '24 08:01 uiskla

I am experiencing the exact same thing with a Thinkpad X13 Yoga (gen1):

  • random freezes during the installation process (sometimes in pre-reboot, sometime after)
  • kernel latest option in installer doesn't make a difference
  • upgrading from 4.1 sometimes works and other times freezes (only in the post-reboot stages)

deeplow avatar Jan 04 '24 09:01 deeplow

I have seen the same on HP Elitedesk 800 G2 in the installer in UEFI mode, always pre-reboot. Interestingly, installing in legacy mode (after modifying MBR partition table for the installer so it actually boots in legacy mode) worked without this problem.

On the installed system I observed occasional freezes similar to described here (no response whatsoever, by the sound of fan CPU warms up during those), but I wouldn't say they are as random as during the installation - they happened when I wanted to reboot the platform, either right after clicking user's name in top right (menu doesn't expand in that case), or after choosing Log Out... from that menu (after menu had disappeared, but before next window pops up). Those freezes were frequent enough to be annoying, but I can't reproduce it now, so maybe there are some specific conditions needed for that to occur. Most (if not all) of the reboot attempts when it froze happened after I upgraded or installed new packages, not sure if that was a coincidence or not.

krystian-hebel avatar Jan 08 '24 15:01 krystian-hebel

Could it be related to HWP (CPU power management) ? Can you try adding cpufreq=xen:hwp=off to Xen cmdline?

marmarek avatar Jan 08 '24 19:01 marmarek

This Xen command line option fixes it for me (tested with xen-4.17.2-8 and kernel-6.6.9-1 from qubes-dom0-current-testing). Additionally, the thermals seem to be much better with this setting.

I have tested several times with the stress command which would provoke a freeze quite quickly otherwise. Also, I let the system run for some testing some functionality (which would also result in a freeze sooner or later before).

Please let me know if I should perform additional tests, possibly with new software versions or other settings.

uiskla avatar Jan 09 '24 12:01 uiskla

i think i am encountering this with fresh install of R4.2, dom0 up-to-date. on x230 with Heads. "this" being the consistent hard freeze when load becomes higher, needing to hard reset computer.

~~how do i add this command line option in Heads? the /boot is read only, not able to add kexec-menu.txt with modified grub entry for booting (if i understand the documentation correctly which i probably dont).~~ i needed to access it from within Qubes rather than from Heads recovery shell.

fix was insufficient on its own, combined with latest kernel it is currently stable :crossed_fingers:

also to clarify i had some bad memory cards and swapping in some known-good ones may have also contributed to the fix for me.

mfc avatar Jan 15 '24 20:01 mfc

I can confirm this issue for Lenovo T490, however hwp increases battery drain noticeably. Details #4604 https://github.com/QubesOS/qubes-issues/issues/4604#issuecomment-1903483300_

rwiesbach avatar Jan 22 '24 08:01 rwiesbach

With Fedora 39 and Xen + HWP, I saw some random freezes after some uptime, but I think they were unrelated to HWP. Sometimes the dom0 journal shows a kernel list_del corruption error as the last message, but there are no details on where the error occurred. i.e. it seems the kernel froze before outputting the message. Other times the system froze without anything in the journal. That was seen with 6.6.3 and 6.6.9. I don't think I've seen any freezes since 6.6.11

jandryuk avatar Jan 22 '24 17:01 jandryuk

Sometimes the dom0 journal shows a kernel list_del corruption error as the last message

I've seen that once with 6.5.6. There is also https://github.com/QubesOS/qubes-issues/issues/8794, but that's reliable crash, not something that happens after some time.

marmarek avatar Jan 23 '24 03:01 marmarek

i think i am encountering this with fresh install of R4.2, dom0 up-to-date. on x230 with Heads. "this" being the consistent hard freeze when load becomes higher, needing to hard reset computer.

If I am not mistaken this configuration is pretty much the same as the "Privacy Beast" certified device.

I am thus bumping this because even though it's not reproducible yet, it is a 4.2 regression that is currently preventing people from migrating to 4.2 and the upgrade deadline is approaching. On top of that, it does seem to affect certified devices.

Unfortunately I cannot use the device for helping narrow this down because it's currently in use.

deeplow avatar Mar 25 '24 11:03 deeplow

i think i am encountering this with fresh install of R4.2, dom0 up-to-date. on x230 with Heads. "this" being the consistent hard freeze when load becomes higher, needing to hard reset computer.

If I am not mistaken this configuration is pretty much the same as the "Privacy Beast" certified device.

I am thus bumping this because even though it's not reproducible yet, it is a 4.2 regression that is currently preventing people from migrating to 4.2 and the upgrade deadline is approaching. On top of that, it does seem to affect certified devices.

Unfortunately I cannot use the device for helping narrow this down because it's currently in use.

yes is pretty much same as privacybeast, however my report is unreliable as i had bad memory sticks, not having the issues any more.

mfc avatar Mar 25 '24 12:03 mfc

yes is pretty much same as privacybeast, however my report is unreliable as i had bad memory sticks, not having the issues any more.

Ah. I see. Originally my device also had sticks and it would arbitrarily fail (RAM issues) and sometimes freeze (this issue). Then I got a replacement device of the same model and then only the freezes persisted. But I am happy to see that your problem was resolved.

deeplow avatar Mar 25 '24 12:03 deeplow

I'm experiencing this on a T490. Is the advice in https://github.com/QubesOS/qubes-issues/issues/8825#issuecomment-1881729655 (to try cpufreq=xen:hwp=off) still applicable? This is a test machine so happy to provide whatever logs / try other debugging steps as needed.

legoktm avatar Apr 02 '24 19:04 legoktm

Is the advice in #8825 (comment) (to try cpufreq=xen:hwp=off) still applicable?

I did end up trying this and it fixed the freezing issue entirely.

legoktm avatar Apr 12 '24 16:04 legoktm

Warning: It seems that Update 4.2 -> 4.2.1 restores the xen comandline to default and thereby removes the xen:hwp=off workaround (as well as other tweaks)

rwiesbach avatar Apr 16 '24 08:04 rwiesbach

It just happened again - the xen commandline was reset and the system started to freeze randomly again. There should really be a fix that is update-safe, given that 4.2 is mandatory for updates in about 6 weeks (which means that many more systems affected by the bug have to be migrated ...)

rwiesbach avatar May 08 '24 12:05 rwiesbach

Do not modify existing lines in /etc/default/grub, but add the option on a new line, like this:

GRUB_CMDLINE_XEN_DEFAULT="$GRUB_CMDLINE_XEN_DEFAULT cpufreq=xen:hwp=off"

This way it will not be changed on update

marmarek avatar May 08 '24 12:05 marmarek

So /etc/default/grub is partially reset on update, but not fully? strange.

For other readers: In order to apply /etc/default/grub manually (to apply it for the current system, not only after updates), I think
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

it the correct way. This way you do not have to manually change one file for the current system and another for "survives updates"

It seems that setting a parameter a second time on the xen commandline replaces the first value, which means that you can use this method to change xen parameters as well.

rwiesbach avatar May 08 '24 15:05 rwiesbach