firecracker-go-sdk
firecracker-go-sdk copied to clipboard
Firecracker socket resets connection on shutdown
while trying out the machine.Shutdown()
call from the go-sdk, the firecracker socket resets the connection and also shuts down the VM disgracefully due to the error.
I also used the new ForwardSignals
set to []os.Signal{}
to make sure firecracker does not listen to any interrupt signals coming from the host operating system.
Unable to send CtrlAltDel: Put "http://localhost/actions": read unix @->/run/firecracker.socket: read: connection reset by peer component=microvm-0
DEBU[2020-04-21T20:20:32] instance Destroy() component="microvm-0 (867e7d58)" error="Put \"http://localhost/actions\": read unix @->/run/firecracker.socket: read: connection reset by peer"
INFO[2020-04-21T20:20:32] Finished shutdown of VMM component=vmm
WARN[2020-04-21T20:20:33] firecracker exited: signal: interrupt component=microvm-0
DEBU[2020-04-21T20:20:33] closing the exitCh 1 error occurred:
* signal: interrupt
component=microvm-0
However when I try using the same kernel image with firectl
, I am able to cleanly shutdown using the curl command to the firecracker socket
ps aux | grep firecracker
sriramv+ 12276 7.4 0.5 530096 40892 pts/0 Sl+ 00:57 0:05 /usr/local/bin/firecracker --api-sock /home/sriramvenkat/.firecracker.sock-12268-81
sriramv+ 12366 0.0 0.0 112712 976 pts/1 S+ 00:58 0:00 grep --color=auto firecracker
[sriramvenkat@containedwaned ~]$ curl --unix-socket /home/sriramvenkat/.firecracker.sock-12268-81 -i -X PUT "http://localhost/actions" -H "accept: application/json" -H "Content-Type: application/json" -d "{
> \"action_type\": \"SendCtrlAltDel\"
> }"
HTTP/1.1 204
Server: Firecracker API
Connection: keep-alive
[sriramvenkat@containedwaned ~]$ ps aux | grep firecracker
sriramv+ 12276 5.3 0.5 530100 40892 pts/0 Sl+ 00:57 0:05 /usr/local/bin/firecracker --api-sock /home/sriramvenkat/.firecracker.sock-12268-81
sriramv+ 12391 0.0 0.0 112712 976 pts/1 S+ 00:59 0:00 grep --color=auto firecracker
What is the configuration of your Linux kernel? Shutdown internally uses SendCtrlDel, which is not supported by some Linux kernels. See https://github.com/firecracker-microvm/firecracker/issues/1095 for the detail.
@sriramsv Do you still see this issue?
I am experiencing the same behaviour as described here.
Running the above curl command against the socket causes a reboot and a clean shutdown:
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Stopped Create System Users.
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Reached target System Shutdown.
[ OK ] Reached target Late Shutdown Services.
[ OK ] Finished System Reboot.
[ OK ] Reached target System Reboot.
[ 39.516404] reboot: Restarting system
2024-04-22T15:08:31.644600842 [anonymous-instance:main] Vmm is stopping.
2024-04-22T15:08:31.644866100 [anonymous-instance:main] Vmm is stopping.
2024-04-22T15:08:31.775123014 [anonymous-instance:main] Firecracker exiting successfully. exit_code=0
When using the SDK and calling Shutdown(), I see:
ERRO[0013] Unable to send CtrlAltDel: Put "http://localhost/actions": read unix @->/var/run/sockets/firecracker-621912026/firecracker.sock: read: connection reset by peer
2024/04/22 15:09:17 Error shutting down of-k3s-1: Put "http://localhost/actions": read unix @->/var/run/sockets/firecracker-621912026/firecracker.sock: read: connection reset by peer
In the guest, I see no new output on the serial console.
firecracker.Machine
// Shutdown requests a clean shutdown of the VM by sending CtrlAltDelete on the virtual keyboard
func (m *Machine) Shutdown(ctx context.Context) error {
m.logger.Debug("Called machine.Shutdown()")
return m.sendCtrlAltDel(ctx)
}
For some reason, there have been no SDK releases since Sept 2022, but I've tried using that and the latest commit in the main branch. I got the same results in either case.
Firecracker v1.7.0
@swagatbora90 would you have any suggestions here? I saw you'd made commits into the SDK repository recently, feel free to ping someone else if needed.
It seems like the bug is potentially a regression with signal passing, since when I invoke the same codepath through an internal HTTP endpoint, the shutdown method works as expected.
When I run control + c on the controller program, the signal appears to be propagated directly to Firecracker causing it to exit immediately meaning it's unable to handle the HTTP request.
I overrode the list of signals to make sure that nothing that would cause a termination would get passed through and can see that in the Go SDK:
DEBU[0001] Setting up signal handler: [user defined signal 1]
The debug logs show that this signal propagation is faulty, despite only being configured to send through USR1, the control + c went to the Firecracker process and terminated it before the controller could clean it up.
DEBU[0061] Called machine.Shutdown()
WARN[0061] firecracker exited: signal: interrupt
When I run control + c on the controller program, the signal appears to be propagated directly to Firecracker causing it to exit immediately meaning it's unable to handle the HTTP request.
When you press Ctrl+C in the terminal, it sends the signal to all processes in the current process group rather than just a single process. I think the Go SDK would need to allow spawning firecracker/jailer processes with syscall.SysProcAttr{Setpgid: true}
so that firecracker/jailer can be placed in their own process group, which would prevent them from getting signaled. Currently the SDK doesn't provide any options for customizing the exec.Cmd
for firecracker/jailer, so there would need to be a patch to allow this
That's a fair point, and what we eventually found out too. Thanks for commenting.