[Bug] Loading big memory snapshot can put processes in uninterruptible sleep

Open CompuIves opened this issue 2 years ago • 1 comments

Describe the bug

Whenever we load a reasonably sized memory snapshot (8GB), which has been running several node processes, we notice that some processes get stuck in an uninterruptible sleep . This both happens with the uffd handler and the "default" snapshot loading. They seem to get stuck on a syscall waiting for a page fault:

These are some processes in the guest that get stuck:

root@sandbox:/# cat /proc/556/stack 
[<0>] kvm_async_pf_task_wait_schedule+0x14b/0x180
[<0>] __kvm_handle_async_pf+0x51/0xb0
[<0>] exc_page_fault+0x1cd/0x430
[<0>] asm_exc_page_fault+0x1e/0x30

root@sandbox:/# cat /proc/556/syscall 
-1 0x7ffc7071bdb0 0xeedff3

strace:

read(21, "\0\1\4\234\1\1\3\10\2\3\nunilateral\10\2\3\fsubscri"..., 65536) = 417
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
futex(0x7fcee8000020, FUTEX_WAKE_PRIVATE, 1) = 1
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
futex(0x7fcee0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7fcee0000020, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fcee0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x7fcee0000020, FUTEX_WAKE_PRIVATE, 1) = 0
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
mprotect(0x23a6580000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x381b42940000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0xb96a3c40000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x2c373db00000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb442000, 86016, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb5c2000, 249856, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb482000, 249856, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb4c2000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb502000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb542000, 4096, PROT_READ|PROT_WRITE) = 0
futex(0x5f99b1c, FUTEX_WAIT_PRIVATE, 0, NULL

No particular dmesg logs.

We've seen this happen on hosts that have AMD Epyc CPUs. We haven't tested with other CPUs.

Interestingly, this only happens in certain host/guest kernel version combinations.

To Reproduce

Start a VM with 8GB of memory, start a simple Node server
Save snapshot
Resume VM from snapshot
After a while, some processes in the VM will get into an uninterruptable sleep

Expected behaviour

The processes should continue responding.

Environment

Firecracker v1.0 and v1.1
I can share a RootFS and kernel if that makes it easier
AMD
Tested on btrfs, zfs & ext4

Host Kernel	Guest Kernel	Works?
5.4	5.10	Yes
5.10	5.10	No
5.10	5.11	No
5.10	5.15	No
5.10	5.4	Yes

Checks

[x] Have you searched the Firecracker Issues database for similar problems?
[x] Have you read the existing relevant Firecracker documentation?
[x] Are you certain the bug being reported is a Firecracker issue?

Jun 01 '22 10:06 CompuIves

In order to get a better understanding of the issue, could you please share the guest kernel and rootfs? Also, it would be useful to have a script that replicates your running scenario (starting the processes that you see failing).

Could you please, also share some info on the host setup (distro and kernel config, if available)?

Jun 28 '22 08:06 bchalios

Hi @CompuIves, are you still experiencing this issue? I see that you merged a fix in your fork, has that solved the problem for you?

Nov 24 '22 07:11 luminitavoicu

Hey! The commit in our fork fixed the issue. That said, we're removing the fix in a future version where we rely on UFFD to handle page faults. We've updated the host kernel to Linux 6 and guest kernel to 5.15 in this scenario, and we cannot reproduce the issue anymore.

Nov 24 '22 19:11 CompuIves

Hi @CompuIves , if you are able to reproduce this issue in currently supported versions of Firecracker and kernels, please feel free to post the results and re-open.

May 09 '23 08:05 mattschlebusch

firecracker firecracker copied to clipboard

[Bug] Loading big memory snapshot can put processes in uninterruptible sleep

Describe the bug

To Reproduce

Expected behaviour

Environment

Checks

firecracker
firecracker copied to clipboard