qubes-issues
qubes-issues copied to clipboard
Resume from suspending is broken after update to Xen 4.14
Qubes R4.1
I`ve tested with X1 Thinkpad gen6. Hyperthreading is disabled in UEFI, kernel 5.4.51 and 5.4.61. With Xen 4.13 all works fine.
Tried with and without VMs running. Looks like suspend successful (thinkpad's led smooth blinked), but it doesn't resume.
Is problem specific to me?
There are multiple bugs causing this, summary below.
S3 related bugs in Xen 4.14 summary:
- Shadow Stack resume path broken - fix in https://github.com/QubesOS/qubes-vmm-xen/pull/88 (sent upstream too already)
- CR4 restored too late: discussion, fix
-
Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at credit2.c:2273
- no solution yet, a change in Xen 4.14 makes it more likely to happen - crash in memguard_guard_stack - can be worked around by disabling Shadow Stack build time
- hang after second resume (not analyzed yet)
X1 Thinkpad gen6
Was suspend ever working on this model in Qubes? I thought I remember hearing a long time ago that it wasn't working.
P.S. -- Please use the issue template. This is borderline unsuitable for qubes-issues.
Yes, It works with Xen 4.13 till upgrading to 4.14.
P.S. -- ok
The same happens in openQA: https://github.com/QubesOS/qubes-issues/issues/6049
Bisected to https://github.com/xen-project/xen/commit/633ecc4a7cb
And the fix: https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg01892.html
In my case patch gives me reboot while resume action...
I do have a system where the resume is still broken, but looks like a proper panic instead of instant reboot now (5s timeout before reboot, instead of instant). Sadly I don't have serial console there, so I don't know what really happened.
Looks like I have similar timeout and have no serial port at laptop too
Discussion on xen-devel: https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg02074.html
Seeing the same on System76 Galago Pro 4. Suspend worked on Xen 4.13. On 4.14 fans and screen wake up, the Fn-1 fan control works, but the laptop doesn't respond to anything else, and there is nothing on the logs after suspending.
@pwmarcz Have you tried patch from PR?
@jevank I tried it just now, but there's no change, also with all the VMs stopped.
I should also mention that on this laptop reboot doesn't work, so it's possible that Xen was trying to reboot and froze the computer.
I've run yet another bisection and here are the findings: https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg00002.html
Few bisection runs later... I've updated issue description with the summary.
@pwmarcz I've force-pushed https://github.com/QubesOS/qubes-vmm-xen/pull/88 with fixes collected so far. Can you test it?
@marmarek Works so far! I did two succesful resumes with VMs running.
Automated announcement from builder-github
The package python3-xen-4.14.0-5.fc32
has been pushed to the r4.1
testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
Automated announcement from builder-github
The package xen_4.14.0-5
has been pushed to the r4.1
testing repository for the Debian template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list
by uncommenting the line containing stretch-testing
(or appropriate equivalent for your template version), then use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Works fine, great, so much combined errors, thank you.
Automated announcement from builder-github
The package xen_4.14.0-6+deb9u1
has been pushed to the r4.1
stable repository for the Debian template.
To install this update, please use the standard update command:
sudo apt-get update && sudo apt-get dist-upgrade
Automated announcement from builder-github
The package python3-xen-4.14.0-6.fc32
has been pushed to the r4.1
stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
Resume is still not working for me even on the newer releases of Xen (as of 4.14.0-6). Not sure whether this should be treated as the same or a new issue.
Qubes R4.1.0 on Thinkpad L14 Gen1 with AMD Ryzen 5 4500U.
On Xen 4.14.0-6, resume results in a solid power LED and (sometimes) fan spinning, but screen remains off. Any music playing in an AppVM continues after about 7 seconds. Keyboard input seems to have no effect. At about 17 seconds, playback breaks down (starts looping on a single chunk of audio). At about 22 seconds, the computer reboots by itself.
The updates did fix something: On older releases of Xen 4.14 (before 4.14.0-5), attempting to resume would result in fan turning on and keyboard briefly lighting up, but screen would remain black, with power LED still indicating suspend. No indication of VMs resuming. After maybe 10 to 20 seconds, the computer would emit a "SmartBeep" error code of "0002: Internal bus error".
On Xen 4.13.1-4, resume from suspend works perfectly and takes about 2.5 seconds.
Resume still not working as of 4.14.1-1.
What's the best way to help get to the bottom of this issue?
Resume from suspend leads to a reboot on a Dell Latitude 4700, Intel i5-8365U, with 4.14.1-3 (with and without the HWP patches).
Is there any way to make bisecting this a bit less painful? I can't just downgrade to 4.13 without rebuilding a number of other packages that, as built, depend on 4.14.
What do you see in you logs when it reboots?
In my logs when the instant reboot happened it was nothing until I enabled s3-compatibility mode in BIOS.
My Lenovo P14s (AMD Ryzen 7 PRO 4750U) went from rebooting instantly (s2idle) to just hanging (s3-compatibility mode in bios), to the latest versions of everything where I get colorful video corruption.
I have tested this on kernel 5.12 and vmm-xen 4.15-rc5 and linux-firmware 20210315 without success.
Anything I should test?
Mar 30 16:30:23 dom0 kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 30 16:30:23 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Mar 30 16:30:33 dom0 kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 30 16:30:33 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Mar 30 16:30:44 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=42273, emitted seq=42276
Mar 30 16:30:44 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kscreenlocker_g pid 6663 thread kscreenloc:cs0 pid 6666
This is still happening on x1 carbon gen 9, using 4.14.1-4
This is still happening on x1 carbon gen 9, using 4.14.1-4
X1 Carbon G9 and Tiger Lake (Intel Gen 11) doesnt support S3 at all, see https://github.com/QubesOS/qubes-issues/issues/6411
Btw. do current Ryzen 5000 CPUs (Ryzen 5500U, 5700U, 5850U, 5300U) still fully support S3? Meaning, if this suspend-issue is fixed, they will suspend AND resume/wake up again?
Maybe a helpful hint: Debian 11 (RC2, Kernel 5.10) does suspends+resumes on a Ryzen 5500U
@marmarek @fepitre Anything that can be tuned in to debug this kind of issue with clear steps to give proper debugging output?
Having same behavior on Q4.1 from x230 booted from coreboot 4.13 Heads branch. Behavior is not present when booting live Fedora 32, which can suspend and resume from suspend. Xen is at cause, but how to troubleshoot?
Doing a journalctl --boot -1
shows absolutely nothing interesting from dom0, which pauses VM and then go to suspend, where no xl dmesg trace is verbose enough to troubleshoot anything interesting there either.
The Power light flashes showing it is in suspend, but there is no way to resume, the power button keeps flashing and its impossible to wake it up but hard reboot.
Idea: could there be a debug build added to https://qubes.notset.fr/iso/ so that Xen is more verbose on its state? Having a ISO (slower) with all debugging options passed to grub config would be a life saver for debugging purposes. Or alternatively, a pointer into what needs to be added to have all important components output maximum output, with even maybe turning off disk caching so the logs are actually saved on disk directly?
Resume from suspend is also broken on Qubes R4.1-beta1 on Dell XPS 15 9750 (with Xen 4.14.2 and Linux 5.10.47-1.fc32 or 5.12.14-1.fc32`). Suspend seems to work, but resume just spins up the fan and keyboard backlight, and after a few seconds Xen reboots.
On R4.0.4 it worked flawlessly after adding kernel parameter mem_sleep_default=deep
.
Failed attempts to get debugging output (maybe someone knows what is missing to prevent the screen from turning off or how to get the serial output out of a laptop):
- in
/etc/default/grub
forGRUB_CMDLINE_XEN_DEFAULT
removeconsole=none
and addnoreboot=1 loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin
(noreboot=1
prevents Xen reboot) - in
/etc/default/grub
forGRUB_CMDLINE_LINUX
addinitcall_debug ignore_loglevel no_console_suspend
-
grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg
- force sequential suspend/resume process:
echo 0 > /sys/power/pm_async
- trigger suspend
Failed attempts to fix resume based on common workarounds mentioned on HCL:
- turn off all VMs prior to suspend (or blacklist WiFi drivers in
sys-net
:echo iwlmvm >> /rw/config/suspend-module-blacklist && echo iwlwifi >> /rw/config/suspend-module-blacklist
) - xen parameters:
iommu=no-igfx
- kernel parameters:
nomodeset i915.enable_rc6=0 i915.alpha_support=1 rd.blacklist.drivers=nouveau nouveau.modeset=0
- BIOS: disable HT, disable TPM, disable Thunderbolt, disable C-states Control
If somebody wants to dig deeper:
- https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues
@tlaurion, you can modify the xen/kernel parameters in Grub on-the-fly by pressing e
and editing it prior booting (Ctrl+x
).