nixos-images icon indicating copy to clipboard operation
nixos-images copied to clipboard

kexec fails due to IMA being enforced on Azure VMs

Open AkechiShiro opened this issue 1 year ago • 21 comments

kexec fails due to IMA (Integrity Measurement Architecture) being enforced on Azure, I'm using nixos-anywhere and just saw that the image comes from here for unattended install. See here : https://github.com/numtide/nixos-anywhere/issues/189

I want to know, do I need to build a new image in order to use kexec -s instead of kexec ?

It is due to IMA appraisal being enabled on Azure VMs :

[ 3099.239362] ima: impossible to appraise a kernel image without a file descriptor; try using kexec_file_load syscall.

More details here : https://kernsec.org/pipermail/linux-security-module-archive/2018-October/008951.html

To build, a compatible image, I should try and modify the build-images.sh script to my needs ?

AkechiShiro avatar Aug 25 '23 15:08 AkechiShiro

We now pass this flag but it's not clear to me what else is needed

Mic92 avatar Aug 26 '23 19:08 Mic92

@Ma27 I will investigate thoroughly more during the coming week and report back if I find a solution. I will try to see if I can find a way to enroll/sign the kernel as being to get executed on Azure, if I find a way to make it work, I'll let you know the steps I took.

AkechiShiro avatar Aug 26 '23 20:08 AkechiShiro

@Mic92 , @AkechiShiro: FYI: we have been successfully trialing nixos-anywhere with Azure Gen2 'Standard B' image types as described here: https://github.com/tiiuae/ghaf-infra/blob/main/docs/nixos-anywhere.md.

henrirosten avatar Oct 27 '23 10:10 henrirosten

Hi @henrirosten

I'm not sure what you mean by Azure Gen 2 Standard B images ? Is the securityType of the VM TrustedLaunch ? Could you give more information ?

nixos-anywhere fails to kexec due to a missing signature (SecureBoot being enabled and enforced).

Even disabling Integrity Measurement doesn't seem enough.

For more context, trying to modprobe unsigned kernel drivers also fails

AkechiShiro avatar Oct 27 '23 10:10 AkechiShiro

'Standard' is the Azure security type that disables secure boot and IMA.

'B'-series refers to Azure VM image sizes which are deployed on hardware types and processors as described here: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-b-series-burstable.

henrirosten avatar Oct 27 '23 10:10 henrirosten

There must be a way forward by which we could push in the official Azure Marketplace an Azure compatible NixOS image then, we just need to try and work with Lanzaboote folks and see if we can find a way to combine NixOS + Lanzaboote in order to have at least SecureBoot support, IMA will have to be disabled at first.

vTPM doesn't really matter, I'd guess, at first. But having nixos-anywhere compatible with other SecureBoot distributions seems to be a very non-trivial feat, the only way/workaround, I see that is possible, is to disable SecureBoot temporarily, use nixos-anywhere, then activate it back, but what will happen ? Since nixos-anywhere doesn't ship Lanzaboote, in the NixOS image I believe...

@Mic92 : Would a PR showcasing the steps to use nixos-anywhere on Azure gen 2 VMs that have been created using SecurityType : TrustedLaunch and not Standard by disabling SecureBoot temporarily would be something, acceptable for now ? Or would it be useless ?

There is some documentation that is there for anyone interested about testing their non-official NixOS VM image : https://learn.microsoft.com/en-us/partner-center/marketplace/azure-vm-image-test

Anyone interested on working on this, I'd be willing to progress on it slowly as much as I can, if I can commit enough time to make progress on it.

AkechiShiro avatar Oct 28 '23 01:10 AkechiShiro

@AkechiShiro you mean having a guide that describes how to install on Azure with nixos-anywhere? Sure. Could be dropped here: https://github.com/nix-community/nixos-anywhere/tree/main/docs/howtos

Mic92 avatar Oct 28 '23 06:10 Mic92

Here is one idea: Shouldn't it be possible kexec into the original kernel but with ima_appraise=off and than do the actual nixos kexec afterwards?

Mic92 avatar Dec 24 '23 09:12 Mic92

@Mic92 I will try that soon, but I've tried this on a debian 11 Cloud image and was still stuck with some weird issue I couldn't debug at all, but I'll need to check/retry again.

AkechiShiro avatar Dec 26 '23 14:12 AkechiShiro

If it was just an old kernel than https://github.com/nix-community/nixos-images/commit/eaf2d21fa940a86ef7bc2b583850f725b86dc180 might solve it.

Mic92 avatar Dec 26 '23 14:12 Mic92

Hi @Mic92, Sorry for the time taken to give it a try, it took me awhile.

I gave a try to run as root under a machine with Secure Boot disabled and ima_appraisal=off :

curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/nixos-kexec-installer-noninteractive-x86_64-linux.tar.gz | tar -xzf- -C /root
/root/kexec/run

I got this output after the reboot, after the kexec I believe, seems like something bad happened ?

username login: [   10.089786] CPU1 failed to report alive state
[   10.129163] BUG: kernel NULL pointer dereference, address: 0000000000000010
[   10.129779] #PF: supervisor read access in kernel mode
[   10.129779] #PF: error_code(0x0000) - not-present page
[   10.129779] PGD 0 P4D 0 
[   10.129779] Oops: 0000 [#1] PREEMPT SMP PTI
[   10.129779] CPU: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.6.10 #1-NixOS
[   10.129779] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/12/2023
[   10.129779] Workqueue: eval_map_wq tracer_init_tracefs_work_func
[   10.129779] RIP: 0010:event_create_dir+0x29/0x5d0
[   10.129779] Code: 90 41 57 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 48 8b 46 28 4c 8b 6e 10 48 c7 c6 71 0b 9a 87 48 89 7c 24 08 48 89 04 24  8b 45 10 48 8b 18 48 89 df e8 58 40 8e 00 85 c0 0f 84 ec 04 00
[   10.129779] RSP: 0000:ffffa21d00093dd8 EFLAGS: 00010296
[   10.129779] RAX: 0000000000000000 RBX: ffff8bc14020e1e0 RCX: ffff8bc140808080
[   10.129779] RDX: 0000000000000000 RSI: ffffffff879a0b71 RDI: ffff8bc140442b40
[   10.129779] RBP: ffffffff88155260 R08: ffff8bc140b6c060 R09: 0000000000038ee0
[   10.129779] R10: ffff8bc140c3f080 R11: 006e776f64726165 R12: ffff8bc14020e1e0
[   10.129779] R13: 0000000000000000 R14: ffff8bc1402ed405 R15: ffffffff8875c948
[   10.129779] FS:  0000000000000000(0000) GS:ffff8bc1fbc00000(0000) knlGS:0000000000000000
[   10.129779] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.129779] CR2: 0000000000000010 CR3: 000000003d220001 CR4: 00000000003706f0
[   10.129779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.129779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.129779] Call Trace:
[   10.129779]  
[   10.129779]  ? __die+0x23/0x70
[   10.129779]  ? page_fault_oops+0x17d/0x4b0
[   10.129779]  ? exc_page_fault+0x6d/0x150
[   10.129779]  ? asm_exc_page_fault+0x26/0x30
[   10.129779]  ? event_create_dir+0x29/0x5d0
[   10.129779]  ? event_create_dir+0x123/0x5d0
[   10.129779]  __trace_early_add_event_dirs+0x33/0x70
[   10.129779]  event_trace_init+0x98/0xf0
[   10.129779]  tracer_init_tracefs_work_func+0xa/0x2e0
[   10.129779]  process_one_work+0x174/0x340
[   10.129779]  worker_thread+0x27b/0x3a0
[   10.129779]  ? __pfx_worker_thread+0x10/0x10
[   10.129779]  kthread+0xe8/0x120
[   10.129779]  ? __pfx_kthread+0x10/0x10
[   10.129779]  ret_from_fork+0x34/0x50
[   10.129779]  ? __pfx_kthread+0x10/0x10
[   10.129779]  ret_from_fork_asm+0x1b/0x30
[   10.129779]  
[   10.129779] Modules linked in:
[   10.129779] CR2: 0000000000000010
[   10.129779] ---[ end trace 0000000000000000 ]---
[   10.129779] RIP: 0010:event_create_dir+0x29/0x5d0
[   10.129779] Code: 90 41 57 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 48 8b 46 28 4c 8b 6e 10 48 c7 c6 71 0b 9a 87 48 89 7c 24 08 48 89 04 24  8b 45 10 48 8b 18 48 89 df e8 58 40 8e 00 85 c0 0f 84 ec 04 00
[   10.129779] RSP: 0000:ffffa21d00093dd8 EFLAGS: 00010296
[   10.129779] RAX: 0000000000000000 RBX: ffff8bc14020e1e0 RCX: ffff8bc140808080
[   10.129779] RDX: 0000000000000000 RSI: ffffffff879a0b71 RDI: ffff8bc140442b40
[   10.129779] RBP: ffffffff88155260 R08: ffff8bc140b6c060 R09: 0000000000038ee0
[   10.129779] R10: ffff8bc140c3f080 R11: 006e776f64726165 R12: ffff8bc14020e1e0
[   10.129779] R13: 0000000000000000 R14: ffff8bc1402ed405 R15: ffffffff8875c948
[   10.129779] FS:  0000000000000000(0000) GS:ffff8bc1fbc00000(0000) knlGS:0000000000000000
[   10.129779] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.129779] CR2: 0000000000000010 CR3: 000000003d220001 CR4: 00000000003706f0
[   10.129779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.129779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.129779] note: kworker/u4:0[11] exited with irqs disabled

AkechiShiro avatar Jan 12 '24 23:01 AkechiShiro

Wait I gave it a try a second time, it's now working, so ima_appraisal=off, does allow the kexec to happen with SecureBoot disabled. Specific image used was Ubuntu 23.10 EDIT : Network connectivity seems to be broken, I believe DHCP did not run a knew in order to get an IP address

(experimental, only tested for nixos-unstable) Static ip addresses and routes are restored after reboot. Interface that had dynamic addresses before are configured with DHCP and to accept prefixes from ipv6 router advertisement

The IP has been conserved but the DNS server probably need to be tweaked, I'm not sure, what is the default one, will edit if I have the answer.

EDIT 2 : Running the second kexec, in order to install NixOS (using grub) and the default example with some additional configuration led to an impossible to boot machine, it's stuck in HyperV's UEFI saying it found no suitable boot system.

I'll have to give a try with systemd-boot, I may also need to tweak disko's configuration.

AkechiShiro avatar Jan 13 '24 00:01 AkechiShiro

So it seems ima_appraisal=off is not even needed if SecureBoot is off, however the first kexec happen sucessfully :

+ init=/nix/store/nadvk7k5qam9iq19kshbk2c045hkd5q6-nixos-system-nixos-23.11pre-git/init
+ kernelParams=console=tty0 console=ttyS0,115200 loglevel=4
+ readlink -f /root/kexec/kexec/run
+ dirname /root/kexec/kexec/run
+ SCRIPT_DIR=/root/kexec/kexec
+ TMPDIR=/root/kexec/kexec mktemp -d
+ INITRD_TMP=/root/kexec/kexec/tmp.mI4YwicutB
+ cd /root/kexec/kexec/tmp.mI4YwicutB
+ trap cleanup EXIT
+ mkdir -p ssh
+ extractPubKeys /root
+ home=/root
+ key=/root/.ssh/authorized_keys
+ test -e /root/.ssh/authorized_keys
+ grep -o \(\(ssh\|ecdsa\|sk\)-[^ ]* .*\) /root/.ssh/authorized_keys
+ key=/root/.ssh/authorized_keys2
+ test -e /root/.ssh/authorized_keys2
+ test -n root
+ sh -c echo ~root
+ sudo_home=/root
+ extractPubKeys /root
+ home=/root
+ key=/root/.ssh/authorized_keys
+ test -e /root/.ssh/authorized_keys
+ grep -o \(\(ssh\|ecdsa\|sk\)-[^ ]* .*\) /root/.ssh/authorized_keys
+ key=/root/.ssh/authorized_keys2
+ test -e /root/.ssh/authorized_keys2
+ test -e /etc/ssh/authorized_keys.d/root
+ test -n root
+ test -e /etc/ssh/authorized_keys.d/root
+ test -e /etc/ssh/ssh_host_dsa_key
+ cp -a /etc/ssh/ssh_host_dsa_key ssh
+ test -e /etc/ssh/ssh_host_dsa_key.pub
+ cp -a /etc/ssh/ssh_host_dsa_key.pub ssh
+ test -e /etc/ssh/ssh_host_ecdsa_key
+ cp -a /etc/ssh/ssh_host_ecdsa_key ssh
+ test -e /etc/ssh/ssh_host_ecdsa_key.pub
+ cp -a /etc/ssh/ssh_host_ecdsa_key.pub ssh
+ test -e /etc/ssh/ssh_host_ed25519_key
+ cp -a /etc/ssh/ssh_host_ed25519_key ssh
+ test -e /etc/ssh/ssh_host_ed25519_key.pub
+ cp -a /etc/ssh/ssh_host_ed25519_key.pub ssh
+ test -e /etc/ssh/ssh_host_rsa_key
+ cp -a /etc/ssh/ssh_host_rsa_key ssh
+ test -e /etc/ssh/ssh_host_rsa_key.pub
+ cp -a /etc/ssh/ssh_host_rsa_key.pub ssh
+ /root/kexec/kexec/ip --json addr
+ /root/kexec/kexec/ip -4 --json route
+ /root/kexec/kexec/ip -6 --json route
+ [ -f /etc/machine-id ]
+ cp /etc/machine-id machine-id
+ find .
+ gzip -9
+ cpio -o -H newc
27 blocks
+ kexecSyscallFlags=
+ + sort -c -V
uname -r
+ printf %s\n 6.1 6.5.0-1010-azure
+ kexecSyscallFlags=--kexec-syscall-auto
+ /root/kexec/kexec/kexec --load /root/kexec/kexec/bzImage --kexec-syscall-auto --initrd=/root/kexec/kexec/initrd --no-checks --command-line init=/nix/store/nadvk7k5qam9iq19kshbk2c045hkd5q6-nixos-system-nixos-23.11pre-git/init console=tty0 console=ttyS0,115200 loglevel=4
machine will boot into nixos in 6s...
+ echo machine will boot into nixos in 6s...
+ test -e /dev/kmsg
+ exec
ssh: connect to host localhost port 22: Connection refused
....
Endless repeat of the last line
But the VM looses some network connectivity after the kexec, the script thus cannot reach the VM in order to finish the "nixosification"

On the VM, NixOS did kexec successfully and the ssh service is running :

[nixos@nixos:~]$ systemctl status sshd
● sshd.service - SSH Daemon
     Loaded: loaded (/etc/systemd/system/sshd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-01-16 ; 14s ago
    Process: 636 ExecStartPre=/nix/store/n7lpzrgsj5kmwsnm8fvv8cawr8qycym6-unit->
   Main PID: 639 (sshd)
         IP: 0B in, 0B out
         IO: 1.3M read, 0B written
      Tasks: 1 (limit: 4195)
     Memory: 3.4M
        CPU: 133ms
     CGroup: /system.slice/sshd.service
             └─639 "sshd: /nix/store/9fkxlh9gyxnb7bahc2rn0b5fhamgb63m-openssh-9>

nixos systemd[1]: Starting SSH Daemon...
nixos systemd[1]: Started SSH Daemon.
nixos sshd[639]: Server listening on 0.0.0.0 port 22.
nixos sshd[639]: Server listening on :: port 22.

AkechiShiro avatar Jan 16 '24 19:01 AkechiShiro

By following some tips on https://github.com/nix-community/nixos-anywhere/issues/112 and also https://github.com/tiiuae/ghaf-infra/blob/main/docs/nixos-anywhere.md?plain=1#L138-L149

I was able to install NixOS using nixos-anywhere, I also add to use --post-kexec-ssh-port as the port wasn't the default one.

I will try to document the steps and create PR in the future.

~~EDIT : I'm still lacking internet connectivity despite being able to reach the virtual machine using ssh :thinking: (dns seems to be working fine)~~ (I was wrong everything works as intended) EDIT 2 : Also did the install with systemd-boot instead of grub.

AkechiShiro avatar Jan 16 '24 20:01 AkechiShiro

I think nixos-anywhere could automate this kexec step as well if it detects a locked down kernel.

Mic92 avatar Jan 17 '24 14:01 Mic92

By lockdown you mean if IMA is configured and enabled ?

However for SecureBoot enabled machine we still don't have a solution yet, I believe the only way to have SecureBoot on Azure would probably to first contact Microsoft to know if there is a process.

But there should probably no way to nixos-anywhere unless we could sign the kernels with the key enrolled on the Azure machine.

AkechiShiro avatar Jan 17 '24 18:01 AkechiShiro

Is IMA not the mechanism that is in place in case the machine was booted with secure boot?

Mic92 avatar Jan 17 '24 18:01 Mic92

I think IMA is kinda of an extension of SecureBoot to cover more files but on my test the machine, I did disable SecureBoot, I'll do some test with SecureBoot on and ima_appraise=off and report the result.

But so far SecureBoot off, IMA appraisal off worked.

Then with just SecureBoot off it should work out too.

Note : also sometimes the kexec seems to fail and the machine is kind of frozen after a nulle pointer dereference in the kernel and a CPU core seems just stuck

AkechiShiro avatar Jan 17 '24 19:01 AkechiShiro

@Mic92 it seems that if SecureBoot is enabled, it is not possible to kexec.

$ cat /proc/cmdline 
BOOT.... console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 consoleblank=0 ima_appraise=off

See, even with ima_appraise=off :

[   60.022694] PEFILE: Unsigned PE binary
[   60.024444] kexec_file: Enforced kernel signature verification failed (-61).

Also ima_appraisal does not exist ? I only find ima_appraise=off as valid online. So without SecureBoot maybe adding ima_appraise=off is not needed.

AkechiShiro avatar Jan 19 '24 18:01 AkechiShiro

Maybe it should be stated in the README that kexec doesn't work with secure boot

usama8800 avatar Sep 03 '24 19:09 usama8800

@usama8800 feel free to add it.

Mic92 avatar Sep 04 '24 16:09 Mic92