firecracker
firecracker copied to clipboard
[Bug] Loading big memory snapshot can put processes in uninterruptible sleep
Describe the bug
Whenever we load a reasonably sized memory snapshot (8GB), which has been running several node processes, we notice that some processes get stuck in an uninterruptible sleep . This both happens with the uffd handler and the "default" snapshot loading. They seem to get stuck on a syscall waiting for a page fault:
These are some processes in the guest that get stuck:
root@sandbox:/# cat /proc/556/stack
[<0>] kvm_async_pf_task_wait_schedule+0x14b/0x180
[<0>] __kvm_handle_async_pf+0x51/0xb0
[<0>] exc_page_fault+0x1cd/0x430
[<0>] asm_exc_page_fault+0x1e/0x30
root@sandbox:/# cat /proc/556/syscall
-1 0x7ffc7071bdb0 0xeedff3
strace:
read(21, "\0\1\4\234\1\1\3\10\2\3\nunilateral\10\2\3\fsubscri"..., 65536) = 417
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024) = 8
write(16, "\1\0\0\0\0\0\0\0", 8) = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024) = 8
write(16, "\1\0\0\0\0\0\0\0", 8) = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024) = 8
futex(0x7fcee8000020, FUTEX_WAKE_PRIVATE, 1) = 1
write(16, "\1\0\0\0\0\0\0\0", 8) = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024) = 8
futex(0x7fcee0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7fcee0000020, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fcee0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x7fcee0000020, FUTEX_WAKE_PRIVATE, 1) = 0
write(16, "\1\0\0\0\0\0\0\0", 8) = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024) = 8
mprotect(0x23a6580000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x381b42940000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0xb96a3c40000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x2c373db00000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb442000, 86016, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb5c2000, 249856, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb482000, 249856, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb4c2000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb502000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb542000, 4096, PROT_READ|PROT_WRITE) = 0
futex(0x5f99b1c, FUTEX_WAIT_PRIVATE, 0, NULL
No particular dmesg logs.
We've seen this happen on hosts that have AMD Epyc CPUs. We haven't tested with other CPUs.
Interestingly, this only happens in certain host/guest kernel version combinations.
To Reproduce
- Start a VM with 8GB of memory, start a simple Node server
- Save snapshot
- Resume VM from snapshot
- After a while, some processes in the VM will get into an uninterruptable sleep

Expected behaviour
The processes should continue responding.
Environment
- Firecracker v1.0 and v1.1
- I can share a RootFS and kernel if that makes it easier
- AMD
- Tested on btrfs, zfs & ext4
| Host Kernel | Guest Kernel | Works? |
|---|---|---|
| 5.4 | 5.10 | Yes |
| 5.10 | 5.10 | No |
| 5.10 | 5.11 | No |
| 5.10 | 5.15 | No |
| 5.10 | 5.4 | Yes |
Checks
- [x] Have you searched the Firecracker Issues database for similar problems?
- [x] Have you read the existing relevant Firecracker documentation?
- [x] Are you certain the bug being reported is a Firecracker issue?
In order to get a better understanding of the issue, could you please share the guest kernel and rootfs? Also, it would be useful to have a script that replicates your running scenario (starting the processes that you see failing).
Could you please, also share some info on the host setup (distro and kernel config, if available)?
Hi @CompuIves, are you still experiencing this issue? I see that you merged a fix in your fork, has that solved the problem for you?
Hey! The commit in our fork fixed the issue. That said, we're removing the fix in a future version where we rely on UFFD to handle page faults. We've updated the host kernel to Linux 6 and guest kernel to 5.15 in this scenario, and we cannot reproduce the issue anymore.
Hi @CompuIves , if you are able to reproduce this issue in currently supported versions of Firecracker and kernels, please feel free to post the results and re-open.