qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

Resume from suspending is broken after update to Xen 4.14

Open jevank opened this issue 4 years ago • 64 comments

Qubes R4.1

I`ve tested with X1 Thinkpad gen6. Hyperthreading is disabled in UEFI, kernel 5.4.51 and 5.4.61. With Xen 4.13 all works fine.

Tried with and without VMs running. Looks like suspend successful (thinkpad's led smooth blinked), but it doesn't resume.

Is problem specific to me?


There are multiple bugs causing this, summary below.

S3 related bugs in Xen 4.14 summary:

jevank avatar Sep 18 '20 12:09 jevank

X1 Thinkpad gen6

Was suspend ever working on this model in Qubes? I thought I remember hearing a long time ago that it wasn't working.

P.S. -- Please use the issue template. This is borderline unsuitable for qubes-issues.

andrewdavidwong avatar Sep 18 '20 14:09 andrewdavidwong

Yes, It works with Xen 4.13 till upgrading to 4.14.

P.S. -- ok

jevank avatar Sep 18 '20 14:09 jevank

The same happens in openQA: https://github.com/QubesOS/qubes-issues/issues/6049

marmarek avatar Sep 20 '20 18:09 marmarek

Bisected to https://github.com/xen-project/xen/commit/633ecc4a7cb

marmarek avatar Sep 27 '20 13:09 marmarek

And the fix: https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg01892.html

marmarek avatar Sep 27 '20 16:09 marmarek

In my case patch gives me reboot while resume action...

jevank avatar Sep 28 '20 13:09 jevank

I do have a system where the resume is still broken, but looks like a proper panic instead of instant reboot now (5s timeout before reboot, instead of instant). Sadly I don't have serial console there, so I don't know what really happened.

marmarek avatar Sep 28 '20 13:09 marmarek

Looks like I have similar timeout and have no serial port at laptop too

jevank avatar Sep 28 '20 15:09 jevank

Discussion on xen-devel: https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg02074.html

marmarek avatar Sep 29 '20 20:09 marmarek

Seeing the same on System76 Galago Pro 4. Suspend worked on Xen 4.13. On 4.14 fans and screen wake up, the Fn-1 fan control works, but the laptop doesn't respond to anything else, and there is nothing on the logs after suspending.

pwmarcz avatar Sep 30 '20 08:09 pwmarcz

@pwmarcz Have you tried patch from PR?

jevank avatar Sep 30 '20 09:09 jevank

@jevank I tried it just now, but there's no change, also with all the VMs stopped.

I should also mention that on this laptop reboot doesn't work, so it's possible that Xen was trying to reboot and froze the computer.

pwmarcz avatar Sep 30 '20 10:09 pwmarcz

I've run yet another bisection and here are the findings: https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg00002.html

marmarek avatar Oct 01 '20 02:10 marmarek

Few bisection runs later... I've updated issue description with the summary.

marmarek avatar Oct 03 '20 13:10 marmarek

@pwmarcz I've force-pushed https://github.com/QubesOS/qubes-vmm-xen/pull/88 with fixes collected so far. Can you test it?

marmarek avatar Oct 05 '20 14:10 marmarek

@marmarek Works so far! I did two succesful resumes with VMs running.

pwmarcz avatar Oct 05 '20 15:10 pwmarcz

Automated announcement from builder-github

The package python3-xen-4.14.0-5.fc32 has been pushed to the r4.1 testing repository for dom0. To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

qubesos-bot avatar Oct 06 '20 03:10 qubesos-bot

Automated announcement from builder-github

The package xen_4.14.0-5 has been pushed to the r4.1 testing repository for the Debian template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot avatar Oct 06 '20 03:10 qubesos-bot

Works fine, great, so much combined errors, thank you.

jevank avatar Oct 06 '20 07:10 jevank

Automated announcement from builder-github

The package xen_4.14.0-6+deb9u1 has been pushed to the r4.1 stable repository for the Debian template. To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot avatar Oct 29 '20 03:10 qubesos-bot

Automated announcement from builder-github

The package python3-xen-4.14.0-6.fc32 has been pushed to the r4.1 stable repository for dom0. To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

qubesos-bot avatar Oct 29 '20 04:10 qubesos-bot

Resume is still not working for me even on the newer releases of Xen (as of 4.14.0-6). Not sure whether this should be treated as the same or a new issue.

Qubes R4.1.0 on Thinkpad L14 Gen1 with AMD Ryzen 5 4500U.

On Xen 4.14.0-6, resume results in a solid power LED and (sometimes) fan spinning, but screen remains off. Any music playing in an AppVM continues after about 7 seconds. Keyboard input seems to have no effect. At about 17 seconds, playback breaks down (starts looping on a single chunk of audio). At about 22 seconds, the computer reboots by itself.

The updates did fix something: On older releases of Xen 4.14 (before 4.14.0-5), attempting to resume would result in fan turning on and keyboard briefly lighting up, but screen would remain black, with power LED still indicating suspend. No indication of VMs resuming. After maybe 10 to 20 seconds, the computer would emit a "SmartBeep" error code of "0002: Internal bus error".

On Xen 4.13.1-4, resume from suspend works perfectly and takes about 2.5 seconds.

planiitis avatar Nov 15 '20 22:11 planiitis

Resume still not working as of 4.14.1-1.

What's the best way to help get to the bottom of this issue?

planiitis avatar Feb 27 '21 23:02 planiitis

Resume from suspend leads to a reboot on a Dell Latitude 4700, Intel i5-8365U, with 4.14.1-3 (with and without the HWP patches).

Is there any way to make bisecting this a bit less painful? I can't just downgrade to 4.13 without rebuilding a number of other packages that, as built, depend on 4.14.

dmoerner avatar Mar 15 '21 16:03 dmoerner

What do you see in you logs when it reboots?

In my logs when the instant reboot happened it was nothing until I enabled s3-compatibility mode in BIOS.

My Lenovo P14s (AMD Ryzen 7 PRO 4750U) went from rebooting instantly (s2idle) to just hanging (s3-compatibility mode in bios), to the latest versions of everything where I get colorful video corruption.

I have tested this on kernel 5.12 and vmm-xen 4.15-rc5 and linux-firmware 20210315 without success.

Anything I should test?

Mar 30 16:30:23 dom0 kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 30 16:30:23 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Mar 30 16:30:33 dom0 kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 30 16:30:33 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Mar 30 16:30:44 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=42273, emitted seq=42276
Mar 30 16:30:44 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kscreenlocker_g pid 6663 thread kscreenloc:cs0 pid 6666

isodude avatar Mar 30 '21 15:03 isodude

This is still happening on x1 carbon gen 9, using 4.14.1-4

prdn avatar Jun 03 '21 07:06 prdn

This is still happening on x1 carbon gen 9, using 4.14.1-4

X1 Carbon G9 and Tiger Lake (Intel Gen 11) doesnt support S3 at all, see https://github.com/QubesOS/qubes-issues/issues/6411

Btw. do current Ryzen 5000 CPUs (Ryzen 5500U, 5700U, 5850U, 5300U) still fully support S3? Meaning, if this suspend-issue is fixed, they will suspend AND resume/wake up again?

bigdx avatar Jul 21 '21 12:07 bigdx

Maybe a helpful hint: Debian 11 (RC2, Kernel 5.10) does suspends+resumes on a Ryzen 5500U

bigdx avatar Jul 27 '21 21:07 bigdx

@marmarek @fepitre Anything that can be tuned in to debug this kind of issue with clear steps to give proper debugging output?

Having same behavior on Q4.1 from x230 booted from coreboot 4.13 Heads branch. Behavior is not present when booting live Fedora 32, which can suspend and resume from suspend. Xen is at cause, but how to troubleshoot?

Doing a journalctl --boot -1 shows absolutely nothing interesting from dom0, which pauses VM and then go to suspend, where no xl dmesg trace is verbose enough to troubleshoot anything interesting there either. The Power light flashes showing it is in suspend, but there is no way to resume, the power button keeps flashing and its impossible to wake it up but hard reboot.

Idea: could there be a debug build added to https://qubes.notset.fr/iso/ so that Xen is more verbose on its state? Having a ISO (slower) with all debugging options passed to grub config would be a life saver for debugging purposes. Or alternatively, a pointer into what needs to be added to have all important components output maximum output, with even maybe turning off disk caching so the logs are actually saved on disk directly?

tlaurion avatar Jul 28 '21 13:07 tlaurion

Resume from suspend is also broken on Qubes R4.1-beta1 on Dell XPS 15 9750 (with Xen 4.14.2 and Linux 5.10.47-1.fc32 or 5.12.14-1.fc32`). Suspend seems to work, but resume just spins up the fan and keyboard backlight, and after a few seconds Xen reboots.

On R4.0.4 it worked flawlessly after adding kernel parameter mem_sleep_default=deep.

Failed attempts to get debugging output (maybe someone knows what is missing to prevent the screen from turning off or how to get the serial output out of a laptop):

  • in /etc/default/grub for GRUB_CMDLINE_XEN_DEFAULT remove console=none and add noreboot=1 loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin (noreboot=1 prevents Xen reboot)
  • in /etc/default/grub for GRUB_CMDLINE_LINUX add initcall_debug ignore_loglevel no_console_suspend
  • grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg
  • force sequential suspend/resume process:echo 0 > /sys/power/pm_async
  • trigger suspend

Failed attempts to fix resume based on common workarounds mentioned on HCL:

  • turn off all VMs prior to suspend (or blacklist WiFi drivers in sys-net: echo iwlmvm >> /rw/config/suspend-module-blacklist && echo iwlwifi >> /rw/config/suspend-module-blacklist)
  • xen parameters: iommu=no-igfx
  • kernel parameters: nomodeset i915.enable_rc6=0 i915.alpha_support=1 rd.blacklist.drivers=nouveau nouveau.modeset=0
  • BIOS: disable HT, disable TPM, disable Thunderbolt, disable C-states Control

If somebody wants to dig deeper:

  • https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues

@tlaurion, you can modify the xen/kernel parameters in Grub on-the-fly by pressing e and editing it prior booting (Ctrl+x).

gw0 avatar Aug 03 '21 07:08 gw0