nri icon indicating copy to clipboard operation
nri copied to clipboard

api: add namespace adjustment

Open tych0 opened this issue 1 year ago • 7 comments
trafficstars

We are interested in running some parts of a pod in host or totally separate pid and network namespaces, so add an adjustment that allows for that.

tych0 avatar Nov 22 '24 17:11 tych0

Need more detail on the use case. What is a "part" of a pod. E.g. the network namespace, well there is currently only one network namespace for the shared networks of the pod and all of it's containers.. and that network namespace is host, type pod generated by the container runtime, or type user namespace pod as directed by kubelet based on the pod spec and in this case the runc runtime engine creates the netns under the user namespace. Is this some sort of non-k8s use case for linux distros to support pods such as podman pods? Need to understand these use cases to understand where and how to manage these security / isolation changes, possibly on a client basis and possibly under a new non k8s.io namespace.

mikebrow avatar Dec 02 '24 17:12 mikebrow

Sure,

What is a "part" of a pod.

Specifically, one container in a pod. The rest of the pod we will leave as is. In fact,

E.g. the network namespace, well there is currently only one network namespace for the shared networks of the pod and all of it's containers..

it's exactly the network namespace that we want to change here. The rest of the pod will live in the same set of namespaces as it usually does.

Is this some sort of non-k8s use case for linux distros

It is non-k8s in the sense that the network namespace we care about is created entirely outside of k8s, and there is no k8s infrastructure for managing it. It is unrelated to any linux-distro specific thing, and has to do with Netflix' network architecture. There is an old Plumber's talk about the specifics here: https://lpc.events/event/11/contributions/932/attachments/908/1764/LPC%202021_%20Talking%20IPv6%20to%20IPv4%20Without%20NAT_2.pdf

tych0 avatar Dec 02 '24 20:12 tych0

it's exactly the network namespace that we want to change here.

Actually it's the pid ns as well. We want to run in the parent pidns of the containers, so that we can see them to do seccomp() operations on them correctly.

tych0 avatar Dec 02 '24 21:12 tych0

thx for the detail

mikebrow avatar Dec 03 '24 21:12 mikebrow

I didn't pay attention to the open PRs and ended up doing another PR to adjust namespaces: #135 The key differences:

  • you pass only the added / modified / deleted namespaces, not the full list
  • ownership is per namespace (1 plugin add cgroup, another plugin change network)
  • there are some helper functions

I'm fine if we pick this PR in the end, I just want to be able to adjust namespaces :)

As for the security discussion, I though NRI was considered part of the runtime, ie you get NRI you get root. Today adjusting mounts or devices you can likely already escape to the host, so I don't think adjusting namespaces or seccomp changes anything security wise, it's already wide open.

champtar avatar Jan 23 '25 20:01 champtar

@samuelkarp @tych0 @dcantah @mikebrow @champtar @kad @etungsten I'd really like to try moving things forward both with these pending PRs and #137. Since I couldn't come up with anything better, I rolled a branch for testing with

  • a prototype implementation of configurable adjustment restrictions (described in #137)
  • cherry-picked #123
  • cherry-picked #124
  • also picked #118, as merging that requires a small change to #123 and #124
  • extra commits to implement adjustment restrictions with tests for namespaces and seccomp policies
  • extra commits to add adjustment setters for namespaces and seccomp policies for consistency with existing adjustments

If the approach proposed in #137 is anywhere close to something acceptable, then this branch should show what bits and pieces we'd need to get those controls in, together with (globally) restrictable namespace and seccomp adjustments (as a first step).

If you have some extra cycles, PTAL:

  • https://github.com/klihub/nri/commits/devel/restrictions%2Bnamespace%2Bseccomppolicy

And if you have any comments regarding #137, please chime in there as well.

klihub avatar Feb 07 '25 16:02 klihub

@klihub I add a very quick look at your branch, I would prefer you to use #135 as it's more fine grained (separate add/modify/delete, per namespace ownership, ...) In any case can you open a PR so this move forward ? Small note, I initially just read the email notification and I'm just seeing the link to the code now, better to post new messages than edit

champtar avatar Feb 19 '25 18:02 champtar

Closing in favor of #135 by @champtar, which implements a superset of this functionality.

klihub avatar Jul 14 '25 16:07 klihub