firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

[Bug] Loading big memory snapshot can put processes in uninterruptible sleep

Open CompuIves opened this issue 2 years ago • 1 comments

Describe the bug

Whenever we load a reasonably sized memory snapshot (8GB), which has been running several node processes, we notice that some processes get stuck in an uninterruptible sleep . This both happens with the uffd handler and the "default" snapshot loading. They seem to get stuck on a syscall waiting for a page fault:

These are some processes in the guest that get stuck:

root@sandbox:/# cat /proc/556/stack 
[<0>] kvm_async_pf_task_wait_schedule+0x14b/0x180
[<0>] __kvm_handle_async_pf+0x51/0xb0
[<0>] exc_page_fault+0x1cd/0x430
[<0>] asm_exc_page_fault+0x1e/0x30

root@sandbox:/# cat /proc/556/syscall 
-1 0x7ffc7071bdb0 0xeedff3

strace:

read(21, "\0\1\4\234\1\1\3\10\2\3\nunilateral\10\2\3\fsubscri"..., 65536) = 417
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
futex(0x7fcee8000020, FUTEX_WAKE_PRIVATE, 1) = 1
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
futex(0x7fcee0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7fcee0000020, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7fcee0000020, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x7fcee0000020, FUTEX_WAKE_PRIVATE, 1) = 0
write(16, "\1\0\0\0\0\0\0\0", 8)        = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 0) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
mprotect(0x23a6580000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x381b42940000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0xb96a3c40000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x2c373db00000, 262144, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb442000, 86016, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb5c2000, 249856, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb482000, 249856, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb4c2000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb502000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x358edb542000, 4096, PROT_READ|PROT_WRITE) = 0
futex(0x5f99b1c, FUTEX_WAIT_PRIVATE, 0, NULL

No particular dmesg logs.

We've seen this happen on hosts that have AMD Epyc CPUs. We haven't tested with other CPUs.

Interestingly, this only happens in certain host/guest kernel version combinations.

To Reproduce

  1. Start a VM with 8GB of memory, start a simple Node server
  2. Save snapshot
  3. Resume VM from snapshot
  4. After a while, some processes in the VM will get into an uninterruptable sleep

image

Expected behaviour

The processes should continue responding.

Environment

  • Firecracker v1.0 and v1.1
  • I can share a RootFS and kernel if that makes it easier
  • AMD
  • Tested on btrfs, zfs & ext4
Host Kernel Guest Kernel Works?
5.4 5.10 Yes
5.10 5.10 No
5.10 5.11 No
5.10 5.15 No
5.10 5.4 Yes

Checks

  • [x] Have you searched the Firecracker Issues database for similar problems?
  • [x] Have you read the existing relevant Firecracker documentation?
  • [x] Are you certain the bug being reported is a Firecracker issue?

CompuIves avatar Jun 01 '22 10:06 CompuIves

In order to get a better understanding of the issue, could you please share the guest kernel and rootfs? Also, it would be useful to have a script that replicates your running scenario (starting the processes that you see failing).

Could you please, also share some info on the host setup (distro and kernel config, if available)?

bchalios avatar Jun 28 '22 08:06 bchalios

Hi @CompuIves, are you still experiencing this issue? I see that you merged a fix in your fork, has that solved the problem for you?

luminitavoicu avatar Nov 24 '22 07:11 luminitavoicu

Hey! The commit in our fork fixed the issue. That said, we're removing the fix in a future version where we rely on UFFD to handle page faults. We've updated the host kernel to Linux 6 and guest kernel to 5.15 in this scenario, and we cannot reproduce the issue anymore.

CompuIves avatar Nov 24 '22 19:11 CompuIves

Hi @CompuIves , if you are able to reproduce this issue in currently supported versions of Firecracker and kernels, please feel free to post the results and re-open.

mattschlebusch avatar May 09 '23 08:05 mattschlebusch