graphene
graphene copied to clipboard
Improving Fork Performance with Zombie Pools
Description of the Problem
In Linux-SGX PAL fork is implemented via forking to a new process, creating a new SGX enclave, and restoring a memory checkpoint from the parent process/SGX enclave. As a result, applications using fork system call suffer from high overheads to create new processes (and SGX enclaves) when compared to their non-SGX alternatives. The main overhead stems from creating an SGX enclave possibly with GB's of enclave memory for every fork. This pattern is common among server applications such as Apache HTTP, nginx, or redis.
Proposed Solution: Zombie Pools
We suggest to amortize the time to create processes over several fork invocations. We recognize that a forked process could be reused subsequent to the exit system call by another fork in a different process. This would allow to instantiate a new Graphene process without requiring reinitializing the SGX enclave or create a new process and only requires to cleanup and restore a new checkpoint.
While this idea can be implemented in a general way with a global zombie pool, we think that initially it should be implemented as a per process zombie pool. This holds several advantages which simplify the implementation and does not require global coordination and applies to the major impacted workloads such as server applications.
Initially when Graphene starts, it starts as usual creating an enclave. Once this process forks for the first time, it would work as it does today (creating a new process, creating a new enclave, and restoring a checkpoint). Once the child has finished and called exit(), instead of exiting the process the child would notify the parent about the exit and wait on a response from the parent. On the parent side, the exit message from the child results in storing the zombie child in a free list. This free list is used once a new fork occurs within the parent. At this point Graphene would reuse the zombie child by issuing a new checkpoint. At this point it skips the creation of a new process via fork and creating a new SGX enclave.
We're assuming that the child exited with a successful exit code. In addition, this only works for fork and we do not consider exec, since exec loads a different manifest with different layouts and MRENCLAVE. Using it for exec is possible but requires additional considerations such as zombie pools per manifest. Also once a parent exits, it informs all children to exit. This limits the length of zombie pool chains to a single child. While this limits the applicability, we think it is important to not leave exhaustive amounts of resources unused. We therefore suggest the following lifecycle for processes:
Normal mode: Process started as before
- Transition to zombie mode on
exit
Zombie mode: Process exited and will wait on message from parent
- Transition to die on
exit, if parent doesn't exit - Transition to normal mode, if parent sends new checkpoint
- Transition to die, if parent sends exit message
Implementation Details
We suggest an implementation in the libraryOS layer. Such optimization should be available to all PAL layers to optimize their fork performance. We briefly structure the work into 4 main tasks and describe their possible implementation.
- Keep list of Zombies
- Define list of Zombies in
shim_process.hinstruct shim_process - Intercept exit message of child (
shim_ipc_child.cin fctipc_cld_exit_callback)- Add this child to the zombie list
- May consider changing the message to tell that it is going to zombie
mode instead of exit callback (to differentiate between error and
normal exit)
- Currently message includes exit code and and term signal (may not be necessary)
- Define list of Zombies in
- Don't exit, goto zombie mode
- Intercept exit of a process
shim_exit.c(libos_exitandlibos_clean_and_exit)
- Kill all children (if exist)
- Send term message to zombie children
- New message
- Keep IPC to parent
- Split
del_all_ipc_portsimplementation into parent and all other IPC
- Split
- May need PAL cleanup of state
- PAL objects may require cleanup
- Wait on IPC to parent
- In
libos_clean_and_exitwait for parent message to either terminate or restart process
- In
- Intercept exit of a process
- Create child from zombie
- Intercept fork and checkpoint restore (
shim_checkpoint.cincreate_process_and_send_checkpoint) - Check that call is for fork and not exec (argument
execis not set) - Find zombie from zombie list
- If zombie is available, checkpoint and restore
- Otherwise create new process and go through the normal creation
- Intercept fork and checkpoint restore (
- Manifest option for fork pooling
- Define
libos.fork_pooling= 0/1 - All implementation should only be enabled when
libos.fork_pooling= 1 - Define global variable in
shim_init.cand set it inshim_init.c(~ line 500)- Use
toml_int_into extract the integer value oflibos.fork_pooling
- Use
- Define
What does this not solve?
The described approach and its implementation suggestion is limited at two points. First, it does not support exec which is common in applications that rely on the system libc function to spawn new shell executions. Second, it does not allow chains of zombies pools to exist. As a result, the particular case where an application executes sh -c ldconfig in a new process is not speed up (only the first invocation of sh may use a zombie from a pool, the subsequent fork into ldconfig has no zombie). While we think that these cases are common, they typically appear several times at the beginning of an application while forking could occur throughout the lifetime of the application. In addition, the approach could be altered to allow for these cases eventually and further improve performance of more use cases.
We would like to solicit your feedback on the proposal.
Thanks, Anjo.
This helps greatly for applications that frequently fork children during runtime. E.g., web/database applications that fork a child for every client connection (PostgreSQL). On one of such applications, we observe that a typical run (with ~500 forks) takes 3 hours instead of 5 minutes (36x runtime overhead) due to enclave creation on every fork.
When recycling zombie, its state needs to be re-initialized before receiving checkpoint. i.e. bring its statue into known (initial) state. There are several ways.
For memory, One simple way is to stash the original image of PAL and LibOS(and app binary image) in reserved area as read only and copy into the actual area. If we can trust the file of PAL and LibOS(e.g. by checking hash value), re-reading them into memory will be another option. This implies some small executable is needed in addition to Pal and LibOS to handle it. Another approach is to make LibOS release all the unused memory on shim_do_exit().(or reinitiazation). I'm not sure how hard it would be without auditing the code in such context.
For other resources, e.g. opened file, they needs to be released on exit correctly. Anyway LibOS is tracking them to some extent.
Once re-initialization is implemented and hash value for executable is known, zombie approach would be applied to exec case.
some random thoughts:
- In Pal/Linux-SGX case, shared libraries that is known to be loaded(or whatever files known to be read) can be also initially loaded when building enclave in memory. Then ocall to read shared library can be eliminated. This helps also to shorten normal startup time. Anyway measurement(how ocall slows down fork/startup) should be done.
- Do we want to control the total number of zombies? Given the above example, it would be too-early optimization. Simple timeout to kill unused zombie would be enough at first.
I think we'll have to wait with implementing this until we rewrite IPC (#2107).