firecracker
firecracker copied to clipboard
Add option to abort VMM when parent process dies
Changes
This adds a flag to set the parent death signal (SIGUSR2 in this instance) that the process will receive when its parent process exits before the VMM does. Receipt of this signal will cause the VMM to abruptly abort, much like the SIGILL signal. While a graceful shutdown would be preferable, since the parent process may have been controlling outside resources for Firecracker (disks, networking, etc.), it's indeterminate whether or not it is safe to continue running the VM.
Reason
If Firecracker is being monitored by a parent process that unexpectedly terminates, it will be abandoned up the process tree, likely to a process that doesn't know what do with it (such as init). This becomes even trickier if the process was running in a mount namespace that was controlled by the parent process, as the API socket is now inaccessible.
If the parent process was also keeping handles on other resources used by the Firecracker VMM, these could be re-used by new processes and cause conflicts with the now orphaned Firecracker.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.
PR Checklist
- [ ] If a specific issue led to this PR, this PR closes the issue.
- [x] The description of changes is clear and encompassing.
- [x] Any required documentation changes (code and docs) are included in this PR.
- [ ] API changes follow the Runbook for Firecracker API changes.
- [ ] User-facing changes are mentioned in
CHANGELOG.md
. - [x] All added/changed functionality is tested.
- [ ] New
TODO
s link to an issue. - [x] Commits meet contribution quality standards.
- [x] This functionality cannot be added in
rust-vmm
.
Hi @jseba,
Thanks for the contribution. I have some questions to understand a bit more the context:
You send SIGUSR2
to let Firecracker know that its parent process died. Please, correct me if I'm wrong, but this is not a standard mechanism, i.e. processes do not receive SIGUSR2 when their parent process dies. So, why not just send SIGINT to kill the Firecracker process?
I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler
Thanks for the explanation @jseba.
I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death
Just to clarify, the signal does not arrive from the OS, IOW it is not the kernel that sends the signal, it is some monitoring process (I guess), right?
in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler
I don't think that reserving a signal number is worth it for the sole purpose of logging the reason why Firecracker was shutdown. Is there any other cleanup/actions you have in mind?
Thanks for the explanation @jseba.
I used SIGUSR2 as the signal that firecracker gets from the OS on its parent's death
Just to clarify, the signal does not arrive from the OS, IOW it is not the kernel that sends the signal, it is some monitoring process (I guess), right?
No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.
in case there is anything that needs to be done differently on this path (e.g. logging that it's exiting due to the parent exiting) compared to the SIGINT handler
I don't think that reserving a signal number is worth it for the sole purpose of logging the reason why Firecracker was shutdown. Is there any other cleanup/actions you have in mind?
The log message was the only thing I could think of off hand behavior wise, but my thought process was that if the parent process is still alive and firecracker gets SIGTERM, even from another process, the parent process will still be running to get the result from waitpid
, so firecracker can gracefully shut itself down. If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.
I fixed the merge conflict and rebased back onto the tip of main
No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.
So, you are saying that when a process dies, Linux sends by itself SIGUSR2 to all its children processes? I was not aware of this mechanism. Could you share a link that documents that?
If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.
Like what? What is the worst thing that can happen? And why can this happen without Firecracker's involvement?
No, the signal comes from the kernel directly, no other management process is involved. That's why I selected a different signal to differentiate it since otherwise it is indistinguishable from the parent process interrupting it.
So, you are saying that when a process dies, Linux sends by itself SIGUSR2 to all its children processes? I was not aware of this mechanism. Could you share a link that documents that?
Linux by default sends nothing. The PR_SET_PDEATHSIG option tells it to send the designated signal to the calling process when the calling process's parent dies.
If the parent process dies and was managing something firecracker needed to run properly, it's no longer safe for firecracker to stay running and it needs to bail immediately.
Like what? What is the worst thing that can happen? And why can this happen without Firecracker's involvement?
The most obvious causes would be either a crash or an OOM kill from the kernel. Either way, the process terminates immediately with no chance for graceful cleanup. In our case, one of things being managed is the network namespace, device and address used for the Firecracker VM. If the parent process dies, the networking configuration used for the VM will be released to be used by another VM while the now-abandoned one continues running. This means another VM can be started on the host with the network configuration that the VM was using, meaning the abandoned VM will no longer have network connectivity. It is not safe to assume that simply terminating the VM gracefully will work in a timely manner, as services running in the VM could attempt to flush data and hang waiting for responses.
Linux by default sends nothing. The PR_SET_PDEATHSIG option tells it to send the designated signal to the calling process when the calling process's parent dies.
Thanks, I didn't know about this mechanism :)
Either way, the process terminates immediately with no chance for graceful cleanup
The graceful clean-up in this case just logging about the fact we are shutting down due to the parent process dying, other than that it's completely equivalent to just sending a SIGTERM to Firecracker.
It is not safe to assume that simply terminating the VM gracefully will work in a timely manner, as services running in the VM could attempt to flush data and hang waiting for responses.
That is true, but my point is that I don't think we need to change Firecracker itself to achieve this. In fact, I don't think we should change Firecracker at all since there's nothing that the Firecracker process needs to do other than shutting down.
What if, for example, you called prctl
after you fork from the current process but before exec
'ing in the Firecracker?