openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

CRIU: Failure taking a checkpoint if SSSD daemon is active

Open tajila opened this issue 2 years ago • 8 comments

CRIU fails to dump a checkpoint if there is an active connection to SSSD

root:instanton# more wlp/usr/servers/defaultServer/logs/checkpoint/checkpoint.log 
Warn  (compel/src/lib/infect.c:126): Unable to interrupt task: 861562 (Operation not permitted)
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861512 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861529 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861533 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861534 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861538 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861539 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861540 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861541 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861543 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861545 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861546 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861547 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:340): Will restore 861548 with interrupted system call
Error (criu/sk-unix.c:865): unix: External socket is used. Consider using --ext-unix-sk option.
Error (criu/cr-dump.c:2048): Dumping FAILED.

Stopping the daemon resolves the issue.

tajila avatar Aug 30 '22 22:08 tajila

@alon-sh Do you know which security components connect to sssd? Can these be delayed until after restore?

FYI @ymanton

tajila avatar Aug 30 '22 22:08 tajila

Unfortunately I don't think it's just security components activating SSSD.

Glibc provides a collection of APIs called NSS, which allows programs to look up things like users, groups, hosts, and so on. Glibc provides a default implementation, but allows it to be overridden, which is what SSSD does.

Functions like getpwuid(uid) (get user info via uid) getgrgid(gid) (get group info via gid), gethostbyname(name) (get IP address of host), etc are all NSS functions that when called will probably interface with SSSD.

NSS documentation is here: https://www.gnu.org/software/libc/manual/html_node/NSS-Basics.html

We can probably figure out which NSS functions are being invoked prior to the checkpoint, but avoiding some of them might be difficult.

ymanton avatar Aug 31 '22 02:08 ymanton

the security code opens the file /dev/urandom to acquire random numbers. that cannot be disabled.

alon-sh avatar Aug 31 '22 05:08 alon-sh

@JasonFengJ9 Can you please put an update of your findings in this Issue

tajila avatar Sep 07 '22 20:09 tajila

Update:

The initial issue was reproduced at a RHEL 8.6 image, OpenLiberty won't be able to take a checkpoint when SSSD is started. A standalone SecureRandom usage for random number generation hit same error as well. Further experiment shows that a Helloworld app can't take checkpoint either. Note: when SSSD is stop, all testcases passed w/ a successful checkpoint, and restore afterwards.

The exception thrown was org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processs, err=-52, I attempted to chase the cause of this error code (-EBADE), the CRIU code in question are criu_dump() -> criu_local_dump(criu_opts *opts) -> send_req_and_recv_resp(criu_opts *opts, CriuReq *req, CriuResp **resp) -> send_req_and_recv_resp_sk(int fd, criu_opts *opts, CriuReq *req, CriuResp **resp) -> *recv_resp(int socket_fd) -> criu_resp__unpack(ProtobufCAllocator *allocator, size_t len, const uint8_t *data) -> protobuf_c_message_unpack(&criu_resp__descriptor, allocator, len, data) which returns a !CriuResp->success. protobuf_c_message_unpack is an API from Google Protocol Buffers library installed. Not able to rebuild it yet.

From criu.log, Error (criu/sk-unix.c:865): unix: External socket is used. Consider using --ext-unix-sk option. indicates that there was a pipe/socket connection probably w/ SSSD at checkpoint. I am trying tcpdump/tshark to identify the socket , and hope we can delay such connection after checkpoint, but no luck yet.

Another option would be to apply --ext-unix-sk which is for the scenario that checkpoint only takes one side of pipe/socket, not both sides like normal checkpoint, i.e., the application restored expects a broken connection. This might work for simple testcases but not sure about OpenLiberty.

JasonFengJ9 avatar Sep 07 '22 21:09 JasonFengJ9

Update:

Narrowed down the cause of checkpoint failure, it is NSS API such as getpwuid(uid)[1]. A standalone c testcase demonstrated that a single getpwuid(uid) call can fail a checkpoint when SSSD is enabled. strace output shows that there was socket connection involved.

socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(3)                                = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(3)                                = 0
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3069, ...}) = 0
read(3, "# Generated by authselect on Wed"..., 4096) = 3069
read(3, "", 4096)                       = 0
close(3)                                = 0
openat(AT_FDCWD, "./glibc-hwcaps/x86-64-v3/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./glibc-hwcaps/x86-64-v2/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/x86_64/x86_64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/x86_64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/x86_64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./x86_64/x86_64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./x86_64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./x86_64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27663, ...}) = 0
mmap(NULL, 27663, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f104c0ce000
close(3)                                = 0
openat(AT_FDCWD, "/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\34\0\0\0\0\0\0"..., 832) = 832
lseek(3, 39408, SEEK_SET)               = 39408
read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32
fstat(3, {st_mode=S_IFREG|0755, st_size=46536, ...}) = 0
lseek(3, 39408, SEEK_SET)               = 39408
read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32
mmap(NULL, 2139048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f104b4bd000
mprotect(0x7f104b4c7000, 2093056, PROT_NONE) = 0
mmap(0x7f104b6c6000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9000) = 0x7f104b6c6000
close(3)                                = 0
mprotect(0x7f104b6c6000, 4096, PROT_READ) = 0
munmap(0x7f104c0ce000, 27663)           = 0
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=9253600, ...}) = 0
mmap(NULL, 9253600, PROT_READ, MAP_SHARED, 3, 0) = 0x7f104abe9000
fstat(3, {st_mode=S_IFREG|0664, st_size=9253600, ...}) = 0
fstat(3, {st_mode=S_IFREG|0664, st_size=9253600, ...}) = 0
getpid()                                = 105625
fstat(-1, 0x7ffd76cc9c30)               = -1 EBADF (Bad file descriptor)
getpid()                                = 105625
socket(AF_UNIX, SOCK_STREAM, 0)         = 4
fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
fcntl(4, F_GETFD)                       = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
connect(4, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/nss"}, 110) = 0
fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\1\0\0\0", 4)                  = 4
poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\30\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
read(4, "\0\0\0\0\0\0\0\0", 8)          = 8
openat(AT_FDCWD, "./glibc-hwcaps/x86-64-v3/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./glibc-hwcaps/x86-64-v2/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/x86_64/x86_64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/x86_64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/x86_64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./x86_64/x86_64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./x86_64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./x86_64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=27663, ...}) = 0
mmap(NULL, 27663, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7f104c0ce000
close(5)                                = 0
openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 5
read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260'\0\0\0\0\0\0"..., 832) = 832
fstat(5, {st_mode=S_IFREG|0755, st_size=54344, ...}) = 0
mmap(NULL, 2172760, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7f104a9d6000
mprotect(0x7f104a9e1000, 2097152, PROT_NONE) = 0
mmap(0x7f104abe1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0xb000) = 0x7f104abe1000
mmap(0x7f104abe3000, 22360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f104abe3000
close(5)                                = 0
mprotect(0x7f104abe1000, 4096, PROT_READ) = 0
munmap(0x7f104c0ce000, 27663)           = 0
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=1591, ...}) = 0
lseek(5, 0, SEEK_SET)                   = 0
read(5, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1591
close(5)                                = 0

getpwuid(uid) is used by a few OpenJ9/OMR APIs, it doesn't appear there is an easy way to delay those usages after checkpoint. Also there are other NSS APIs such as getgrgid(), etc.

Besides stopping SSSD completely, another workaround is to modify Name Service Switch configuration file - /etc/nsswitch.conf. For InstantOn OpenLiberty, disabling passwd: sss files systemd seems enough to allow checkpoint proceed while keep other SSSD services.

I think we might document this is a limitation with workarounds.

@tajila thoughts?

[1] https://github.com/eclipse-openj9/openj9-omr/blob/647678f93a84d1b2789c8432d8991adcac7b3773/port/unix/omrsysinfo.c#L3815

JasonFengJ9 avatar Sep 12 '22 12:09 JasonFengJ9

Next steps from the meeting discussion:

  • Check if there is an API to explicitly stop SSSD client/server connection;
  • Explore options to skip the OpenJ9 calls to the APIs in question before checkpoint.

JasonFengJ9 avatar Sep 12 '22 17:09 JasonFengJ9

Update

Identified following usages of getpwuid() & j9sysinfo_get_username()(invoking getpwuid() internally) that can cause OpenLiberty checkpoint failure: j9sysinfo_get_username() retrieves user.name https://github.com/eclipse-openj9/openj9/blob/2270a932fc4d51e458bfbc9b946d4952ec21c237/runtime/jcl/common/system.c#L455-L457 https://github.com/eclipse-openj9/openj9/blob/2270a932fc4d51e458bfbc9b946d4952ec21c237/runtime/port/sysvipc/j9sharedhelper.c#L236

getpwuid() retrieves user.home https://github.com/eclipse-openj9/openj9/blob/2270a932fc4d51e458bfbc9b946d4952ec21c237/runtime/jcl/unix/syshelp.c#L185-L187 https://github.com/eclipse-openj9/openj9/blob/2270a932fc4d51e458bfbc9b946d4952ec21c237/runtime/port/sysvipc/j9shmem.c#L1290-L1292

There are other usage of j9sysinfo_get_username, getpwuid() and j9sysinfo_get_groupname(invoking getgrgid() internally) which don't affect checkpoint.

The APIs in question attempt to retrieve user.name & user.home which trigger socket connection underneath when SSSD is active.

Alternatively j9sysinfo_get_env() could be used to find those values according to environment values USER & HOME for CRIU, verified in a local build.

JasonFengJ9 avatar Sep 19 '22 12:09 JasonFengJ9

Resolved via https://github.com/eclipse-openj9/openj9/pull/15932

JasonFengJ9 avatar Sep 26 '22 15:09 JasonFengJ9