gramine IPC leader does not wait for all processes to finish

Description of the problem

If the first process, which is the IPC leader, exits, all other processes cannot use IPC any more. We need to decide on a first process exit policy and implement it. I see two options:

IPC leader kills all other processes before exiting
IPC leader just cleans up app resources, but stays alive as long as other processes are still running.

We already have a TODO about it: https://github.com/oscarlab/graphene/blob/4f8d6fb8b323d62f00be2ac2c7239878b14afc1e/LibOS/shim/src/sys/shim_exit.c#L23-L27

Jul 06 '21 15:07 boryspoplawski

The second option seems better to me (I think it's fine if the main process of a user app just spawns something and then exits right away leaving the child).

Jul 07 '21 00:07 mkow

(I think it's fine if the main process of a user app just spawns something and then exits right away leaving the child)

But then Graphene will have unexpected behavior: the caller of this user app expects the main process to exit immediately, but instead the caller will hang because the main process is still alive. E.g., when we start Memcached in daemon mode, we expect memcached -d to return immediately, and we continue with our Bash session, but with this second option, the process will simply live forever (until kill on its child). This sounds like a surprising side effect for end users.

Jul 07 '21 06:07 dimakuv

@dimakuv Well, that call would have graphene-something prepended so it wouldn't be that surprising. Also going with the first option would just... kill the memcached.

Jul 07 '21 09:07 boryspoplawski

Yes, both options don't work for this daemon case. I was thinking about "electing a new IPC leader" third option, but this sounds... complex.

If choosing between these two options, I also vote for the second option.

Jul 07 '21 10:07 dimakuv

Yes, both options don't work for this daemon case.

It does not work in the sense it will stay up and not become a "real daemon". But I think if you want to have Graphene as a daemon process on the host, then you should wrap it in some script doing that. Or, in some future, we could add a -d flag to Graphene invocation which would daemonize Graphene as a whole.

Jul 07 '21 15:07 boryspoplawski

A list of related issues that have this "IPC leader does not wait for all processes to finish" bug:

https://github.com/oscarlab/graphene/issues/2426
https://github.com/oscarlab/graphene/issues/2531
https://github.com/oscarlab/graphene/issues/2541
https://github.com/oscarlab/graphene/issues/2542

We closed these issues since they are tracked here. Any new issues that boil down to this bug will be closed and mentioned in this meta-issue. The closing of other issues is to reduce the number of open issues, for easier management. (In other words, it doesn't invalidate the closed issues, it's just that it's very hard to navigate through a huge list of issues.)

Aug 11 '21 07:08 dimakuv

@dimakuv wait, most of the linked issues exhibit the bug described here, but this bug is not the main reason they fail (it just causes some error messages, but does not make the test fail - that happen before these error msgs). gramineproject/graphene#2541 is a good example

Aug 11 '21 15:08 boryspoplawski

Hi @dimakuv

I am attaching the list of tests which fails with IPC issue here for the verification purpose. Some of these tests are not placed in original bug.

waitpid04
waitpid03
fcntl12
recvfrom01
send01
connect01
recv01
sendto01
fcntl12_64
kill09

Aug 19 '21 05:08 anjalirai-intel

What is worth noting is that these tests do not fail, they just print an error message at the end, but still succeed.

Aug 19 '21 14:08 boryspoplawski

I am attaching the list of tests which fails with IPC issue here for the verification purpose. Some of these tests are not placed in original bug.

I've verified that list - non of these fail due to the problem described in this issue. Some of them do sometimes print an error message described here, but it happens already after the test finished and is unrelated to actual result of the test (be it fail or success).

Sep 09 '21 15:09 boryspoplawski

On the original topic: I don't think implementing 2nd option will be that easy: Graphene does not bother cleaning most of the stuff up at exit (depending on host to clean it, which is not a problem atm), so this would require a lot of changes (e.g. closing all files, releasing all handles, cleaning up all threads and all user memory, lot of other cleanups).

Sep 09 '21 15:09 boryspoplawski

We continue to see the issue as of Feb 21st. Copied are a few console links from opensource-CI where the errors were seen: https://localhost:8080/job/graphene-20.04/2237/testReport/LibOS.shim.test.ltp/test_ltp/test_direct___test_ltp_waitpid04_/ https://localhost:8080/job/graphene-20.04/2256/testReport/LibOS.shim.test.ltp/test_ltp/test_direct___test_ltp_waitpid04_/ https://localhost:8080/job/graphene-20.04/2251/testReport/LibOS.shim.test.ltp/test_ltp/test_direct___test_ltp_kill09_/

Feb 21 '22 10:02 jinengandhi-intel

Hello! I wanted to bump this issue since an application I'm trying to run in Gramine (gramine-direct) is failing because the IPC leader exits before its child process is done (see log snippet below). Is there a workaround for this that I could test?

Gramine log snippet:

(libos_parser.c:1609:buf_write_all) [P2:T36:Listener] trace: ---- execve("/home/mmelara/app/bin/Worker", [/home/mmelara/app/bin/Worker,spawnclient,121,124,], [DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0,https_proxy=http://proxy:912/,]) ...
(libos_parser.c:1609:buf_write_all) [P1:T35:Listener] trace: ---- return from vfork(...) = 0x24
(libos_async.c:122:install_async_event) [P1:T35:Listener] debug: Installed async event at 1679016125655580
(libos_fs_lock.c:653:posix_lock_clear_pid) [P1:T23:Listener] debug: clearing POSIX locks for pid 1
(libos_async.c:327:libos_async_worker) [P1:libos] debug: Thread exited, cleaning up
(libos_init.c:303:init_stack) [P2:T36:Worker] debug: Allocated stack at 0x39f249e82000 (size = 0x100000)
(libos_sync_client.c:331:shutdown_sync_client) [P1:T23:Listener] debug: sync client shutdown: closing handles
(libos_sync_client.c:346:shutdown_sync_client) [P1:T23:Listener] debug: sync client shutdown: waiting for confirmation
(libos_sync_client.c:359:shutdown_sync_client) [P1:T23:Listener] debug: sync client shutdown: finished
(libos_ipc_worker.c:285:ipc_worker_main) [P1:libos] debug: IPC worker: exiting worker thread
(libos_async.c:353:libos_async_worker) [P1:libos] debug: Async worker thread terminated
(libos_exit.c:58:libos_clean_and_exit) [P1:T23:Listener] debug: process 1 exited with status 139
(libos_debug.c:94:remove_r_debug) [P2:T36:Worker] debug: removing file:/home/mmelara/app/bin/Listener at 0x2a9969884000
(libos_debug.c:94:remove_r_debug) [P2:T36:Worker] debug: removing file:/usr/local/lib/x86_64-linux-gnu/gramine/runtime/glibc/ld-linux-x86-64.so.2 at 0x2a996984e000
(libos_ipc_worker.c:72:ipc_leader_died_callback) [P2:libos] debug: IPC leader disconnected
(libos_ipc_child.c:23:ipc_child_disconnect_callback) [P2:libos] debug: Unknown process (vmid: 0x1) disconnected

...

(libos_parser.c:1609:buf_write_all) [P2:T36:Worker] trace: ---- writev(2, 0x39f249f80990, 10) = -32
(libos_exit.c:179:libos_syscall_exit_group) [P2:T36:Worker] debug: ---- exit_group (returning 127)
(libos_exit.c:105:thread_exit) [P2:T36:Worker] warning: error clearing POSIX locks: Connection reset by peer (ECONNRESET)
(libos_exit.c:112:thread_exit) [P2:T36:Worker] error: Sending IPC process-exit notification failed: Connection reset by peer (ECONNRESET)
(libos_sync_client.c:331:shutdown_sync_client) [P2:T36:Worker] debug: sync client shutdown: closing handles
(libos_sync_client.c:346:shutdown_sync_client) [P2:T36:Worker] debug: sync client shutdown: waiting for confirmation
(libos_sync_client.c:359:shutdown_sync_client) [P2:T36:Worker] debug: sync client shutdown: finished
(libos_ipc_pid.c:320:ipc_release_id_range) [P2:T36:Worker] debug: sending a request: [36..36]
(libos_ipc_pid.c:323:ipc_release_id_range) [P2:T36:Worker] debug: ipc_send_message: Connection reset by peer (ECONNRESET)
(libos_pid.c:177:release_id) [P2:T36:Worker] warning: IPC pid release failed

Mar 17 '23 01:03 marcelamelara

@marcelamelara I sketched a workaround, but there is no PR/commit/branch that you could actually test: https://github.com/gramineproject/gramine/issues/1168#issuecomment-1425620994

Unfortunately, this is a pretty hard problem. Well, maybe someone will have time to create at least a workaround per my suggestion, so that people can try it out. @marcelamelara I don't know if you'll have the time and will to write the code, but feel free :)

Mar 17 '23 08:03 dimakuv

@marcelamelara I sketched a workaround, but there is no PR/commit/branch that you could actually test: #1168 (comment)

Unfortunately, this is a pretty hard problem. Well, maybe someone will have time to create at least a workaround per my suggestion, so that people can try it out. @marcelamelara I don't know if you'll have the time and will to write the code, but feel free :)

@dimakuv ,Hi, Has the gramine community been trying to solve this issue? It seems like quite an important problem.

Jul 20 '23 07:07 bronzeMe

@bronzeMe No, we didn't work on solving this issue (it would involve a serious design change in Gramine). This has a low priority for us currently.

Jul 20 '23 08:07 dimakuv

@dimakuv Thank you, but in real-world enterprise applications, there is often a Shell script that triggers many Fork operations. It is common for the IPC leader to terminate prematurely, causing the application to fail. So, this issue is quite critical for enterprise users.

Jul 20 '23 08:07 bronzeMe

@bronzeMe But why would Gramine be used in such cases? Gramine is generally not suited to run shell scripts.

Jul 20 '23 09:07 dimakuv

@dimakuv For example, running the Hive CLI inside Gramine.

Jul 20 '23 09:07 bronzeMe

A quick look at https://www.devdoc.net/bigdata/hive-0.12.0/language_manual/cli.html shows me that Hive CLI is not typically used in background mode, so why would this GitHub issue affect your use cases?

Jul 20 '23 10:07 dimakuv

For example, when running the command ‘gramine-sgx bash -c ‘$HIVE_HOME/bin/hive -f /home/my/hive-script.sql’’, the ‘hive’ script is a shell script that includes multiple commands such as ‘which’, ‘dirname’, ‘cat’, and ‘grep’. This often leads to the following issue: if an IPC leader process exits, the child process will not be able to reconnect to the IPC leader process again.

Jul 20 '23 10:07 bronzeMe

@bronzeMe But what are you trying to achieve? Why do you need to execute which inside of the SGX enclave? What is your security model, and why is it important to execute these utilities inside SGX?

Jul 20 '23 10:07 dimakuv

The fundamental reason why it is not possible to execute the “which” command and shell scripts in gramine is what?

Jul 20 '23 11:07 bronzeMe

reason why it is not possible to execute the “which” command and shell scripts in gramine is what?

The usual reason for such problem is misconfiguration on part of the user, probably the /bin/which executable is not included in trusted_files and/or fs mounts. So if you think there's bug in Gramine, please post your manifest, expected and actual results in a new issue. Otherwise, I suspect there's not actually a bug.

shell script that includes multiple commands such as ‘which’, ‘dirname’, ‘cat’, and ‘grep’. This often leads to the following issue: if an IPC leader process exits, the child process will not be able to reconnect to the IPC leader process again.

This is likely incorrect. If you run gramine-sgx sh -c '...', then IPC leader is the shell process, so unless you try to fork out of this shell (&) and let the main process exit, IPC will work just fine. Again, if you think there's bug, please post your script, manifest, expected and actual results and we'll triage. Otherwise, I don't believe the assertion that >This often leads to the following issue:

Again, I suspect problem is in the manifest, which is a support issue, not a bug.

Jul 20 '23 12:07 woju

Just an interesting find: the following article describes the same problem ("Docker PID 1 problem") as we have: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

The authors solved it by prepending such apps with their Python script that serves as the init (PID 1) process: https://github.com/phusion/baseimage-docker/blob/master/image/bin/my_init

I think what we didn't consider in this GitHub issue is the ability to have a minimal (but properly functioning) init for the Gramine environment.

May 15 '24 07:05 dimakuv

gramine gramine copied to clipboard

IPC leader does not wait for all processes to finish

Description of the problem

gramine
gramine copied to clipboard