nri
nri copied to clipboard
api: add namespace adjustment
We are interested in running some parts of a pod in host or totally separate pid and network namespaces, so add an adjustment that allows for that.
Need more detail on the use case. What is a "part" of a pod. E.g. the network namespace, well there is currently only one network namespace for the shared networks of the pod and all of it's containers.. and that network namespace is host, type pod generated by the container runtime, or type user namespace pod as directed by kubelet based on the pod spec and in this case the runc runtime engine creates the netns under the user namespace. Is this some sort of non-k8s use case for linux distros to support pods such as podman pods? Need to understand these use cases to understand where and how to manage these security / isolation changes, possibly on a client basis and possibly under a new non k8s.io namespace.
Sure,
What is a "part" of a pod.
Specifically, one container in a pod. The rest of the pod we will leave as is. In fact,
E.g. the network namespace, well there is currently only one network namespace for the shared networks of the pod and all of it's containers..
it's exactly the network namespace that we want to change here. The rest of the pod will live in the same set of namespaces as it usually does.
Is this some sort of non-k8s use case for linux distros
It is non-k8s in the sense that the network namespace we care about is created entirely outside of k8s, and there is no k8s infrastructure for managing it. It is unrelated to any linux-distro specific thing, and has to do with Netflix' network architecture. There is an old Plumber's talk about the specifics here: https://lpc.events/event/11/contributions/932/attachments/908/1764/LPC%202021_%20Talking%20IPv6%20to%20IPv4%20Without%20NAT_2.pdf
it's exactly the network namespace that we want to change here.
Actually it's the pid ns as well. We want to run in the parent pidns of the containers, so that we can see them to do seccomp() operations on them correctly.
thx for the detail
I didn't pay attention to the open PRs and ended up doing another PR to adjust namespaces: #135 The key differences:
- you pass only the added / modified / deleted namespaces, not the full list
- ownership is per namespace (1 plugin add cgroup, another plugin change network)
- there are some helper functions
I'm fine if we pick this PR in the end, I just want to be able to adjust namespaces :)
As for the security discussion, I though NRI was considered part of the runtime, ie you get NRI you get root. Today adjusting mounts or devices you can likely already escape to the host, so I don't think adjusting namespaces or seccomp changes anything security wise, it's already wide open.
@samuelkarp @tych0 @dcantah @mikebrow @champtar @kad @etungsten I'd really like to try moving things forward both with these pending PRs and #137. Since I couldn't come up with anything better, I rolled a branch for testing with
- a prototype implementation of configurable adjustment restrictions (described in #137)
- cherry-picked #123
- cherry-picked #124
- also picked #118, as merging that requires a small change to #123 and #124
- extra commits to implement adjustment restrictions with tests for namespaces and seccomp policies
- extra commits to add adjustment setters for namespaces and seccomp policies for consistency with existing adjustments
If the approach proposed in #137 is anywhere close to something acceptable, then this branch should show what bits and pieces we'd need to get those controls in, together with (globally) restrictable namespace and seccomp adjustments (as a first step).
If you have some extra cycles, PTAL:
- https://github.com/klihub/nri/commits/devel/restrictions%2Bnamespace%2Bseccomppolicy
And if you have any comments regarding #137, please chime in there as well.
@klihub I add a very quick look at your branch, I would prefer you to use #135 as it's more fine grained (separate add/modify/delete, per namespace ownership, ...) In any case can you open a PR so this move forward ? Small note, I initially just read the email notification and I'm just seeing the link to the code now, better to post new messages than edit
Closing in favor of #135 by @champtar, which implements a superset of this functionality.