lxcfs icon indicating copy to clipboard operation
lxcfs copied to clipboard

A Solution to Fixing Containers when lxcfs Crashes

Open deleriux opened this issue 2 years ago • 7 comments

Hello all,

I briefly mentioned last week that I had a solution Transport endpoint not connected errors in containers when lxcfs crashes without having to restart every container that came up.

I've uploaded the code I have as-is and here it is. https://github.com/deleriux/lxcfs-reattach

I've tested on Ubuntu 22 and Ubuntu 16 (with updated kernel).

The way that this works is by utilizing various system calls introduced into the kernel post 5.2 that split up the mount process into multiple steps, see:

https://lwn.net/Articles/759499/

You can leverage this step-by-step approach to take the source path in the host namespace, then switch to the container namespace and mount the target path in the container namespace.

The algorithm basically is as follows.

  1. Enter the mount namespace lxcfs is running in.
  2. Locate the particular bind mount you are interested in fixing. IE /var/lib/lxcfs/proc/meminfo
  3. Call open_tree() on the path to obtain a mount_fd representing this mount point.
  4. Enter the containers mount namespace (you've now snatched the mount FD from the hosts namespace!)
  5. Call unmount() on containers path to /proc/meminfo
  6. Call move_mount() against /proc/meminfo to reattach this mountpoint to the containers VFS.

The code for this part is kept in https://github.com/deleriux/lxcfs-reattach/blob/main/container.c#L145 .

The remaining code is mostly dedicated to heuristics in finding containers to mount and mountpoints to monitor. I'm pretty sure it littered with stupid bugs, but it works.

The process supports a monitor mode that uses epoll() against all discovered /proc/pid/mounts to watch mounts come and go. If a qualifying mountpoint is unmounted then remounted (such as if lxcfs gets restarted) the process detects it and issues a request to test then rebind mountpoints that no longer work.

If lxcfs crashes and is not restarted, then it cant help there, but as soon as a new instance comes up it should rebind the mountpoints pretty quickly.

My code doesn't / can't distinguish which lxcfs process to use when rebinding mountpoints, it merely selects the 'best/first' working one and runs with it. This is particularly prevalent in LXD in snaps which tends to run its own lxcfs along with the systems lxcfs which can also be running.

I'm not suggesting this is the best and only solution to this problem (or my code for that matter is suitable for this project in its current form) but the algorithm to fix running containers is pretty straightforwards and tends to work flawlessly without being too disruptive.

deleriux avatar Jan 18 '23 14:01 deleriux

Hi @deleriux

that's a good idea, as I said before we are currently working on internal lxcfs mechanism to recover from crashes. But that's a good solution for some cases if rebooting all containers is problematic.

mihalicyn avatar Jan 18 '23 15:01 mihalicyn

We'll need to be very very careful when doing something like that as root in the container can mess with the mount namespace. So we may be tricked into traversing some symlinks, get locked up by hitting an intentionally broken FUSE mount, ...

That's the reason we never invested too much effort into injecting LXCFS mounts into an existing instance. Even prior to the new mount API, we had a workaround using mount propagation to add/remove mounts from containers, but that still had the same security concerns attached to it.

I certainly feel a lot better about the current plan from @mihalicyn to allow recovering from a lxcfs crash by re-attaching to the existing FUSE mounts.

stgraber avatar Jan 18 '23 18:01 stgraber

@mihalicyn we can re-use this one to track the FUSE re-attach work

stgraber avatar Sep 29 '23 15:09 stgraber

@mihalicyn we can re-use this one to track the FUSE re-attach work

@stgraber so, would this fix be added to next release?

zhoushuke avatar Oct 10 '23 08:10 zhoushuke

@mihalicyn we can re-use this one to track the FUSE re-attach work

@stgraber so, would this fix be added to next release?

It won't be addressed in the next release I would say. We need to make some changes in the Linux kernel as a part of this work. But it will be definitely implemented in LXCFS.

Do you have any issues with LXCFS right now?

mihalicyn avatar Oct 10 '23 09:10 mihalicyn

@mihalicyn any update?

zhoushuke avatar Apr 19 '24 09:04 zhoushuke