linux-surface
linux-surface copied to clipboard
SL1 suspend-then-hibernate wakes up unexpectedly and gets very hot
While using suspend-then-hibernate with LUKS encryption, the laptop wakes up unexpectedly.
What usually happens is that i just close the lid (which is configured to do suspend-then-hibernate in my systemd logind.conf), put it in my backpack, and after some time (I would say about 10 minutes) it gets SUPER hot, it never gets this hot not even under heavy load. When i open up the lid, the screen is black but on, and i have to press the power button for a long time to shut it down.
I can not reproduce this consistently, as it only happens about 1 in 10 times or so, and I couldn't yet figure out a pattern or something that could trigger this.
Is this a common issue ? Could this be due to a missconfiguration from me or is this an actual software issue ?
Environment
- Hardware model: Surface Laptop 1
- Kernel version: Linux furcase 5.14.11-arch1-2-surface #1 SMP PREEMPT Sun, 10 Oct 2021 17:58:33 +0000 x86_64 GNU/Linux
- Distribution: Arch Linux
- mkinitcpio hooks:
HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block sd-encrypt filesystems fsck) - mkinitcpio modules:
MODULES=(surface_aggregator surface_aggregator_registry surface_hid_core surface_hid intel_lpss intel_lpss_pci 8250_dw surface_kbd) - I have this in my
/etc/systemd/logind.conf:HandleLidSwitch=suspend-then-hibernate - And this in my
/etc/systemd/sleep.conf:AllowSuspendThenHibernate=yes
Can confirm this also happens on fedora. But i dont think it related to surface-linux kernel, its seem to be a general SB linux problem. Mine got so hot, the screen turned light yellowish (burnmarks). It seems like the CPU is at constant 100% while waiting for Luks input.
Can you try turning off bluetooth before suspending? AFAIK that has a tendency to wake the device.
Also can you try to figure out if the device gets hot when suspending or when resuming? Or does this happen when the device is fully resumed but closed due to some other crash?
You could try getting a log with journalctl -b -<n> where <n> is the number of boots since the issue happened. I.e. if the issue happens and you have to force shutdown the device, use journalctl -b -1 on the next boot. This might not show much though if the system crashes during suspend, but it will show if something happens before or (if it even comes that far) after.
It seems like the CPU is at constant 100% while waiting for Luks input.
SB1 or SB2? On the SB2 this maybe can be fixed by including surface_aggregator and surface_acpi_notify in the initramfs. This part is unrelated to the SL1 problem though, so the getting hot during suspend problem might be different.
I managed to reproduce it. The steps I took are the following:
- I reduced the time it takes to go from suspend to hibernate to 30 seconds so I can make tests faster
- I closed the lid, waited about 5~10 minutes to see if it gets hot
- If not, I would just unlock it and close the lid again
I finally managed to make it do it again (after about 3 or 4 times).
To rectify what I said in my initial post:
The screen is actually OFF (at least this time it was off) but the keyboard (the lights) are ON. When I pressed keys, nothing happens. Pressing the power button displays the Microsoft logo as if the computer was off and it started booting (how is it getting hot then ??). Afterwards, sd-encrypt prompts for a password but the keyboard is unresponsive, so I have to press down on the power button again, until the computer turns off, and the turn it on again and then it works normally.
I dumped the journalctl output to a file. Do I need to search for something specific in it ? Is it safe to put a link to the entire file here or does the logs contain sensitive data ?
(I did not turn bluetooth off yet for the purpose of trying to replicate the bug, but I will do and try to see if it stops the issue)
The screen is actually OFF (at least this time it was off) but the keyboard (the lights) are ON. When I pressed keys, nothing happens. Pressing the power button displays the Microsoft logo as if the computer was off and it started booting (how is it getting hot then ??).
Sounds a bit like it doesn't hibernate properly. Maybe it thinks it's off but in reality it isn't. Did you try hibernation by itself, see if that works properly? Might need some configuring to get right.
Afterwards, sd-encrypt prompts for a password but the keyboard is unresponsive, so I have to press down on the power button again, until the computer turns off, and the turn it on again and then it works normally.
If the keyboard at that time doesn't work it's normally missing driver support in the initramfs. Is it possible that the bootloader uses the wrong one to resume with? I think that's probably a hibernation problem and might be unrelated to the other stuff. So you should really test hibernation separately.
I dumped the journalctl output to a file. Do I need to search for something specific in it ? Is it safe to put a link to the entire file here or does the logs contain sensitive data ?
There shouldn't be too much sensitive information in there. If you're concerned: WiFi SSID, MAC addresses, device UUIDs, maybe username/hostname. Other things depend on the systemd services and userspace stuff you have running.
(I did not turn bluetooth off yet for the purpose of trying to replicate the bug, but I will do and try to see if it stops the issue)
Seems unlikelier that wakeups from bluetooth are the culprit based on your new findings, but probably a good idea to rule that out anyways.
It looks like you're entering and leaving suspend quite often before. I assume that's you trying to trigger the issue? Another thing is that it resumes and switches into hibernation quite quickly, that is intended?
For the last suspend/resume cycle: It looks like the system enters and exits hibernation properly. There are some errors related to WiFi but nothing that we haven't seen before. Essentially, the card detects an error when resuming from suspend and resets itself, after which it looks like it should be working, I think. In any case can you try blacklisting mwifiex and mwifiex_pci temporarily and test if that changes anything?
Also, can you try to reproduce this issue with hibernation only?
Hi, sorry for my long silence.
Yes, I was trying to trigger the issue. And yes, the suspend to hibernate delay was very short so I didn't have to wait so long when I was trying to trigger it.
I disabled mwifiex and mwifiex_pci using kernel parameters (module_blacklist=mwifiex,mwifiex_pci) and used normal hibernation (no suspend-then-hibernate). Problem still occured.
Here is another dump of what happened.
Again, tried it multiple times before it triggered, and it triggered at my last attempt (23:03 to 23:18).
When I opened the lid, the screen was turned on and displaying the last thing that was on my screen (firefox in this case). I had to hold press on the power button for it to shut down. When I started the machine, the browser was still open, so that means it actually resumed.
I triggered the issue by closing the lid (I set the lid closing action to hibernate).
Should I try to trigger the issue by running systemctl hibernate instead ? Without closing the lid ?