criu icon indicating copy to clipboard operation
criu copied to clipboard

Restored process tree uses more memory when Transparent Huge Pages is enabled

Open xiongzubiao opened this issue 2 years ago • 4 comments

Description

When Transparent Huge Pages (THP) is enabled, a restored process tree (with a forked child) might use more memory.

Steps to reproduce the issue:

  1. Start a process which allocates some memory with malloc (e.g., 10MiB) and forks a child. Measure its RSS.
  2. Checkpoint the process tree.
  3. Disable THP. Restore from the checkpoint, and measure the RSS.
  4. Enable THP. Restore from the checkpoint again, and measure the RSS.

Describe the results you received: When THP is enabled, the restored processes use more memory (about 0.5 ~ 2 MiB. May vary). The memory region which RSS increases doesn't really use huge pages (AnonHugePages is zero). Besides, the Referenced value in smaps of the parent process increases, but that of the child process does not.

  • THP disabled:

    • Parent process smaps:
    7f010d45d000-7f010e69c000 rw-p 00000000 00:00 0
    Size:              18684 kB
    Rss:               10504 kB
    Pss:                5258 kB
    Shared_Clean:          0 kB
    Shared_Dirty:      10492 kB
    Private_Clean:         0 kB
    Private_Dirty:        12 kB
    Referenced:        10504 kB
    Anonymous:         10504 kB
    AnonHugePages:         0 kB
    Swap:                  0 kB
    KernelPageSize:        4 kB
    MMUPageSize:           4 kB
    Locked:                0 kB
    ProtectionKey:         0
    VmFlags: rd wr mr mp me ac sd
    
    • Child process smaps:
    7f010d45d000-7f010e69c000 rw-p 00000000 00:00 0
    Size:              18684 kB
    Rss:               10508 kB
    Pss:                5262 kB
    Shared_Clean:          0 kB
    Shared_Dirty:      10492 kB
    Private_Clean:         0 kB
    Private_Dirty:        16 kB
    Referenced:        10508 kB
    Anonymous:         10508 kB
    AnonHugePages:         0 kB
    Swap:                  0 kB
    KernelPageSize:        4 kB
    MMUPageSize:           4 kB
    Locked:                0 kB
    ProtectionKey:         0
    VmFlags: rd wr mr mp me ac sd
    
  • THP enabled:

    • Parent process smaps:
    7f010d45d000-7f010e69c000 rw-p 00000000 00:00 0
    Size:              18684 kB
    Rss:               12508 kB
    Pss:                6262 kB
    Shared_Clean:          0 kB
    Shared_Dirty:      12492 kB
    Private_Clean:         0 kB
    Private_Dirty:        16 kB
    Referenced:        12508 kB
    Anonymous:         12508 kB
    AnonHugePages:         0 kB
    Swap:                  0 kB
    KernelPageSize:        4 kB
    MMUPageSize:           4 kB
    Locked:                0 kB
    ProtectionKey:         0
    VmFlags: rd wr mr mp me ac sd
    
    • Child process smaps:
    7f010d45d000-7f010e69c000 rw-p 00000000 00:00 0
    Size:              18684 kB
    Rss:               12508 kB
    Pss:                6262 kB
    Shared_Clean:          0 kB
    Shared_Dirty:      12492 kB
    Private_Clean:         0 kB
    Private_Dirty:        16 kB
    Referenced:        10508 kB
    Anonymous:         12508 kB
    AnonHugePages:         0 kB
    Swap:                  0 kB
    KernelPageSize:        4 kB
    MMUPageSize:           4 kB
    Locked:                0 kB
    ProtectionKey:         0
    VmFlags: rd wr mr mp me ac sd
    

Describe the results you expected: No RSS increase when THP is enabled.

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

CRIU full dump/restore logs:

Not much difference in the restore logs when the THP is disabled/enabled.

restore_thp.log restore_no_thp.log

Output of `criu --version`:

Version: 3.17
GitID: v3.17-153-g0965eab83
Output of `criu check --all`:

Error (criu/cr-check.c:757): Kernel doesn't support PTRACE_O_SUSPEND_SECCOMP
Error (criu/cr-check.c:802): Dumping seccomp filters not supported: Input/output error
Error (criu/cr-check.c:1031): cgroupns not supported. This is not fatal.
Warn  (criu/cr-check.c:1255): Do not have API to map vDSO - will use mremap() to restore vDSO
Warn  (criu/cr-check.c:1244): clone3() with set_tid not supported
Error (criu/cr-check.c:1286): Time namespaces are not supported
Error (criu/cr-check.c:1296): IFLA_NEW_IFINDEX isn't supported
Warn  (criu/cr-check.c:1313): Pidfd store requires pidfd_open syscall which is not supported
Warn  (criu/cr-check.c:1347): Nftables based locking requires libnftables and set concatenations support
Warn  (criu/cr-check.c:813): ptrace(PTRACE_GET_RSEQ_CONFIGURATION) isn't supported. C/R of processes which are using rseq() won't work.
Warn  (criu/cr-check.c:1173): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

Additional environment details:

xiongzubiao avatar Feb 14 '23 00:02 xiongzubiao

Is THP enabled before step 1 ? Does RSS change if THP is kept enabled during C/R? What happens without C/R at all, i.e

  1. enable THP
  2. start a process which allocates some memory with malloc (e.g., 10MiB) and forks a child; measure its RSS.
  3. disable THP
  4. enable THP
  5. measure RSS

rppt avatar Feb 14 '23 09:02 rppt

Is THP enabled before step 1 ?

Same behavior no matter if THP is enabled before step 1.

Does RSS change if THP is kept enabled during C/R?

The RSS increases along with each C/R cycle if THP is kept enabled. It almost tripled after C/R 30 times. The pages-xxx.img size is consistent with RSS (i.e. increasing). It looks like checkpoint doesn't have problem.

What happens without C/R at all

No RSS change without C/R. The process basically idles after forked a child.

xiongzubiao avatar Feb 14 '23 16:02 xiongzubiao

BTW, if the process doesn't fork a child, RSS does not increase after C/R.

xiongzubiao avatar Feb 14 '23 16:02 xiongzubiao

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Mar 24 '23 00:03 github-actions[bot]