tuxonice-kernel-old Kernel address space randomization

Hello,

I am using Ubuntu, and tried to use the 4.8 and 4.10 kernels from the PPA. However, both crash on resume with no stack trace. It occurs reliably both on my machine, and in an Ubuntu Server VM running in KVM.

While debugging the problem, I discovered that the crash occurs in core_restore_code while copying the pages from the PBE list, when copying the last page. The last page is the boot_kernel_data page, and copying it fails, because the target address is set to the same address, as it was during suspend (i.e. boot_kernel_data_buffer), but due to KASLR that address does not map to anything in the identity mapping.

If I modified the code to store boot_kernel_data_buffer - __PAGE_OFFSET in the header, and convert the value upon reading the header during resume, the page copying loop finished without crashing, but the resume procedure crashed nevertheless later on.

So I recompiled the kernel by turning off the CONFIG_RANDOMIZE_BASE option, and this way resume works without any problems.

I do not know, if hibernation can co-exist with KASLR (possibly not due to identity mapped addresses being stored in various places), but perhaps TuxOnIce could be modified to store the randomized base addresses in the image and restore those too (again, I am not sure if this is possible at all).

In the meantime, I would suggest to compile the PPA kernels without KASLR.

Apr 21 '17 17:04 ivaradi

Thanks for a very helpful report. I'll seek to find some time to look at this further but I agree that for now address space randomisation should be turned off. I suspect absolute addresses will be stored in a lot of places and it won't be feasible to replace them all.

Apr 21 '17 22:04 NigelCunningham

Hi, I have removed the CONFIG_RANDOMIZE_BASE option in the latest Ubuntu PPA builds for kernels 4.8 and 4.10. TuxOnIce works much butter now. Thanks!

Nigel, there was also an issue with KASLR and hibernate in the vanilla kernel. Maybe the solution helps to fix this for tuxonice too: https://patchwork.kernel.org/patch/9174469/ https://patchwork.kernel.org/patch/9172981/

Apr 28 '17 21:04 mschlaeffer

https://launchpad.net/~tuxonice/+archive/ubuntu/staging/+packages

has now two kernels with CONFIG_RANDOMIZE_BASE enabled, if someone wants to try (or debug):

linux - 4.15.0-13.14 ppa7 linux - 4.13.0-45.50 ppa3

Jun 20 '18 18:06 mschlaeffer

I have tested the 4.15.0 kernel. With a full graphical desktop it fails at the beginning of hibernation. I do not know the reason yet.

I also tried with recovery mode. It can hibernate, but crashes on resume. When I specified the nokaslr kernel option, it could resume well.

So it seems, that TuxOnIce is still not compatible with KASLR. But my impression was that it would require significant development to make it compatible. Has there been any work done on it, i.e. is TuxOnIce supposed to work with KASLR now? If so, I will try to find out the reason for the resume crash. If not, I would suggest to continue building the kernel package with KASLR turned off.

Jun 21 '18 18:06 ivaradi

I managed to do some testing with the 4.15 kernel. It turned out that the proprietary nVidia driver and the kernel framebuffer driver had some problem with each other during hibernation, after atomic copy. The last lines of the log are as follows:

[  221.057640] Doing atomic copy/restore.
[  221.267431] ACPI: Preparing to enter system sleep state S4
[  221.274315] PM: Saving platform NVS memory
[  221.279003] Disabling non-boot CPUs ...
[  221.296437] IRQ 23: no longer affine to CPU1
[  221.300715] IRQ 30: no longer affine to CPU1
[  221.305987] smpboot: CPU 1 is now offline
[  221.324430] IRQ 24: no longer affine to CPU2
[  221.328707] IRQ 27: no longer affine to CPU2
[  221.333978] smpboot: CPU 2 is now offline
[  221.352428] IRQ 25: no longer affine to CPU3
[  221.356706] IRQ 26: no longer affine to CPU3
[  221.361981] smpboot: CPU 3 is now offline
[  221.368463] PM: Restoring platform NVS memory
[  221.373375] Enabling non-boot CPUs ...
[  221.377383] x86: Booting SMP configuration:
[  221.381802] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  221.389763]  cache: parent cpu1 should not be sleeping
[  221.395349] CPU1 is up
[  221.397865] smpboot: Booting Node 0 Processor 2 APIC 0x4
[  221.405975]  cache: parent cpu2 should not be sleeping
[  221.411640] CPU2 is up
[  221.414187] smpboot: Booting Node 0 Processor 3 APIC 0x6
[  221.422237]  cache: parent cpu3 should not be sleeping
[  221.427833] CPU3 is up
[  221.432792] ACPI: Waking up from system sleep state S4
[  221.448357] serial 00:02: activated
[  221.461862] 8139too 0000:04:00.0 eth1: link down
[  221.471720] do_IRQ: 1.37 No irq handler for vector
[  221.558053] r8169 0000:07:00.0 eth0: link down
[  221.768017] ata5: SATA link down (SStatus 0 SControl 300)
[  221.770893] ata4: SATA link down (SStatus 0 SControl 300)
[  221.781814] ata3: SATA link down (SStatus 0 SControl 300)
[  221.937827] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  221.957379] ata6.00: configured for UDMA/133
[  222.029834] firewire_core 0000:04:02.0: rediscovered device fw0
[  222.082961] ata1.00: SATA link down (SStatus 0 SControl 300)
[  222.098338] ata1.01: SATA link down (SStatus 0 SControl 300)
[  222.229909] ata2.00: SATA link down (SStatus 0 SControl 300)
[  222.245003] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  222.274277] ata2.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
[  222.291564] ata2.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
[  222.308801] ata2.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
[  222.328919] ata2.01: configured for UDMA/133
[  223.332587] r8169 0000:07:00.0 eth0: link up
[  223.557008] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000857d:0:0

At this point the kernel hangs, but responds to Alt+SysRq keys.

If, however, either a text console is used or X is not running, hibernation (and resuming) succeeds.

So I have been using the 4.15 kernel for some time now with a text console (and, of course, a GNOME session), and seems to work well.

Of course, the same nVidia driver worked with kernel 4.13, so there must have been some change in the kernel that causes this failure. But it is very possible that the nVidia driver is at fault, and kernel 4.15 simply causes it to exhibit the problem. I would be happy for any suggestions to try to make the framebuffer console (and thus fbsplash userui) work. (I have tried to set the nvidia-drm.modeset kernel parameter to 0 or 1, but it did not help.)

Jul 18 '18 18:07 ivaradi

tuxonice-kernel-old tuxonice-kernel-old copied to clipboard

Kernel address space randomization

tuxonice-kernel-old
tuxonice-kernel-old copied to clipboard