conmon icon indicating copy to clipboard operation
conmon copied to clipboard

Support `FDSTORE=1` (sd_pid_notify_with_fds) to restart a container without losing active TCP connections

Open eriksjolund opened this issue 1 year ago • 0 comments

I would like to have a container use

sd_pid_notify_with_fds(0, 0, "FDSTORE=1\nFDNAME=foobar", &fd, 1);

(see man sd_notify) to store an active TCP socket. It would then be possible to restart a container that has an active TCP connection without the container losing the connection. (Containers generally don't use sd_pid_notify_with_fds() so it would only work for containers with support for it).

Scenario: /usr/bin/testserver in the container image IMG:1 supports socket activation. A client on the internet connects. The testserver is started and calls accept() on the leaked-in socket. A new container image IMG:2 is released. The sysadmin wants to upgrade to IMG:2 without having to disconnect the active TCP connection.

The same scenario described in more detail:

  1. sudo useradd test1
  2. sudo machinectl shell test1@
  3. podman image tag IMG:1 tmptag:latest
  4. podman create --rm --name test --network none tmptag:latest /usr/bin/testserver
  5. mkdir -p ~/.config/systemd/user
  6. podman generate systemd --name --new test > ~/.config/systemd/user/test.service
  7. create the file ~/.config/systemd/user/test.socket with the file contents
    [Unit]
    Description=test server
    [Socket]
    ListenStream=0.0.0.0:3000
    [Install]
    WantedBy=default.target
    
  8. systemctl --user start test.socket
  9. a client on the internet connects to TCP port 3000.
  10. /usr/bin/testserver is started with the leaked-in socket (from systemd socket activation)
  11. the testserver calls accept() on the socket
  12. A new container image image IMG:2 is available. podman image tag IMG:2 tmptag:latest
  13. the sysadmin somehow informs the testserver that it needs to send the active TCP socket file descriptor to systemd with FDSTORE=1
  14. testserver sends the active TCP socket file descriptor to systemd and gives it the name foobar sd_pid_notify_with_fds(0, 0, "FDSTORE=1\nFDNAME=foobar", &fd, 1);
  15. testserver terminates with an unclean exit code (see man systemd.service) so that systemd will try to restart the service. (Alternatively the sysadmin could also run systemctl --user restart test.service).
  16. /usr/bin/testserver is started again but this time it also inherits the file descriptor that was previously stored with FDSTORE=1.
  17. testserver calls sd_listen_fds_with_names()

In the above example it was assumed that the /usr/bin/testserveris running stateless (except for having the active TCP connection). In a more realistic scenario /usr/bin/testserver would also need to store its internal application state before restarting. The normal file system could be used to save such a file, or even better memfd_create(2) could be used. Copy-paste from man sd_notify: Application state can either be serialized to a file in /run/, or better, stored in a memory file descriptor. memfd_create(2) (See also memfd_create(2)).

In other words, it would be good if conmon would also support sending a memfd with FDSTORE=1.

Maybe even more file descriptor types could be allowed to be sent? An idea: a new Podman command-line option could adjust what type of file descriptors are allowed to be sent.

Previous discussion:

  • https://github.com/containers/podman/discussions/13570#discussioncomment-2449912

Extra note: I don't have any direct need for this feature right now. I just think it could be a useful feature.

eriksjolund avatar Mar 19 '23 19:03 eriksjolund