kube-ovn icon indicating copy to clipboard operation
kube-ovn copied to clipboard

How to create a mirror (span) port and move the mirror into a pod

Open l-rossetti opened this issue 3 years ago • 8 comments

Feature request

Is it possible to add a mirror port into the ovs bridge and then to inject a veth pair into a pod (via specific annotation) to allow this pod to make network monitoring? I've seen that the mirroring can be enabled both globally (mirroring the traffic of each pod) or selectively for specific pods via an annotation. But as far as I understand from kube-ovn documentation, this traffic can be monitored only by directly accessing the kubernetes node running the ovs-bridge, while I need the mirrored traffic to be available in a specific pod.

If my understanding is correct, the only way I see to dump the mirrored traffic from a pod is to run a pod within the root network namespace of the node. But this way the pod would be able to "see" all the interface cards which are present on the node.

Use case

To monitor the traffic of a set of pods from within a specific pod that runs a network monitoring application (e.g. ntopng). The pod executing the monitoring must be able to "see" only the mirror interface and not all the interfaces of the kubernetes node (e.g. if the pod runs in host network namespace).

l-rossetti avatar Feb 09 '22 15:02 l-rossetti

It's an interesting use case. One manually method I came up with is to run your monitor pod in container network and then use linux commands to move the mirror0 interface into the pod network ns.

I think we can add a new annotation, when a pod carry it, then the CNI move mirror0 into it. Is this what you expect?

oilbeater avatar Feb 09 '22 16:02 oilbeater

It's an interesting use case. One manually method I came up with is to run your monitor pod in container network and then use linux commands to move the mirror0 interface into the pod network ns.

That's exactly what I'm doing right now: I'm manually moving mirror0 into the monitoring pod net-ns. But I'm facing an issue: do you have any clue on how to move mirror0 by using a pod configured with hostNetwork: true but hostPID: false? With this configuration I can mount the /var/run/netns hostPath so that ip netns list shows the list of netns and then I can run ip link set mirror0 netns <NET_NS> but I don't know which namespace to target (e.g. the ns of the pod running the monitoring):

cni-86407706-0954-f561-6849-d965713dcaee (id: 4)
cni-1c986c9b-79a2-f399-1bc5-b6330c469bf2 (id: 1)
cni-5a5cd4ce-8f39-016a-f17a-b50693bbe105 (id: 14)
cni-fd36118f-f851-f735-0122-faf7920a8ba6 (id: 15)
cni-c98d5df0-86af-877a-9fbb-4e5a80a5a4e5 (id: 12)

I think we can add a new annotation, when a pod carry it, then the CNI move mirror0 into it. Is this what you expect?

This what I expect! Moreover it would be very useful to allow the creation of multiple mirror ports. For example I would like to have the traffic of pod0 and pod1 mirrored to mirror0 and the traffic of all pods (so pod0, pod1, podN) to mirror1. This way a tenant could monitor its pods, while the SOC team could monitor the whole node. Do you think it's reasonable?

l-rossetti avatar Feb 09 '22 18:02 l-rossetti

But I'm facing an issue: do you have any clue on how to move mirror0

I run ip link set mirror0 netns /proc/<Pod PID>/ns/net to move mirror0 into a running pod and then nsenter --net=/proc/<Pood PID>/ns/net ip link set mirror0 up it works. It doesn't need special hostNetwork or hostPID configure for pod.

Different levels of mirror is a much more complex work, it requires another set of CRDs and controllers to implement. As it only affect ovs mirror config, I prefer to use another program to do all the advanced mirror configure.

oilbeater avatar Feb 10 '22 07:02 oilbeater

I run ip link set mirror0 netns /proc/<Pod PID>/ns/net to move mirror0 into a running pod and then nsenter --net=/proc/<Pood PID>/ns/net ip link set mirror0 up it works. It doesn't need special hostNetwork or hostPID configure for pod.

Are you running these cmds from the node or from a pod? Cause in my case I'm running from a pod with the following setup:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ovs-mirror-setup
  labels:
    tier: node
    app: ovs-mirror-setup
spec:
  selector:
    matchLabels:
      app: ovs-mirror-setup
  template:
    metadata:
      labels:
        tier: node
        app: ovs-mirror-setup
    spec:
      hostNetwork: true
      containers:
      - image: alpine
        command: ["/bin/sh"]
        args: ["-c", "/run.sh"]
        imagePullPolicy: IfNotPresent
        name: ovs-mirror-setup
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "SYS_ADMIN"]   # would like to avoid SYS_ADMIN cap
        volumeMounts:
        - mountPath: /var/run/openvswitch
          name: ovs-var-run
        - mountPath: /var/run/netns
          name: netns
      volumes:
      - name: ovs-var-run
        hostPath:
          path: /var/run/openvswitch
          type: ""
      - name: netns
        hostPath:
          path: /var/run/netns
          type: ""

By adding SYS_ADMIN capability I'm able to run nsenter --net=/var/run/netns/<NS_NAME> ip a but I cannot interact by process id. While without SYS_ADMIN nsenter returns operation not permitted.
The only way I can interact with the NS keeping only NET_ADMIN cap and hostNetwork: true is to avoid nsenter and run ip link set mirror0 netns <NS_NAME>. This way it doesn't require to have access to the host pids, but I don't know how to understand which namespace_ID is related to which pod.

I'm trying to avoid having a pod with high level privileges

l-rossetti avatar Feb 10 '22 15:02 l-rossetti

I just ran these cmds from node. The cni-xxx-xxx files are created by containerd, but I'm not sure how can related it to a container.

oilbeater avatar Feb 11 '22 16:02 oilbeater

I run ip link set mirror0 netns /proc//ns/net to move mirror0 into a running pod and then nsenter --net=/proc//ns/net ip link set mirror0 up it works. It doesn't need special hostNetwork or hostPID configure for pod.

Are you running these cmds from the node or from a pod? Cause in my case I'm running from a pod with the following setup:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ovs-mirror-setup
  labels:
    tier: node
    app: ovs-mirror-setup
spec:
  selector:
    matchLabels:
      app: ovs-mirror-setup
  template:
    metadata:
      labels:
        tier: node
        app: ovs-mirror-setup
    spec:
      hostNetwork: true
      containers:
      - image: alpine
        command: ["/bin/sh"]
        args: ["-c", "/run.sh"]
        imagePullPolicy: IfNotPresent
        name: ovs-mirror-setup
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "SYS_ADMIN"]   # would like to avoid SYS_ADMIN cap
        volumeMounts:
        - mountPath: /var/run/openvswitch
          name: ovs-var-run
        - mountPath: /var/run/netns
          name: netns
      volumes:
      - name: ovs-var-run
        hostPath:
          path: /var/run/openvswitch
          type: ""
      - name: netns
        hostPath:
          path: /var/run/netns
          type: ""

By adding SYS_ADMIN capability I'm able to run nsenter --net=/var/run/netns/<NS_NAME> ip a but I cannot interact by process id. While without SYS_ADMIN nsenter returns operation not permitted. The only way I can interact with the NS keeping only NET_ADMIN cap and hostNetwork: true is to avoid nsenter and run ip link set mirror0 netns <NS_NAME>. This way it doesn't require to have access to the host pids, but I don't know how to understand which namespace_ID is related to which pod.

I'm trying to avoid having a pod with high level privileges

I managed to get netns path of a Pod in the host by the following steps:

  1. Get Pod container ID:
$ kubectl -n kube-system describe po coredns-78fcd69978-bzj6v
...
Containers:
  coredns:
    Container ID:  containerd://731a54f2ad2a5edb2fa0cab6cccb181490a6c2db90f9f88ee601940f0c1e3f6e
...
  1. Get container namespace path:
$ ctr -n k8s.io c info 731a54f2ad2a5edb2fa0cab6cccb181490a6c2db90f9f88ee601940f0c1e3f6e
...
            "namespaces": [
                ...
                {
                    "type": "network",
                    "path": "/proc/4133/ns/net"
                }
            ],
...
  1. Get network namespace inode number:
$ readlink /proc/4133/ns/net
net:[4026532204]
  1. Get netns file inode number:
$ ls -i /var/run/netns/cni-1a91cdb1-a6f7-e265-9ce5-b7c0eed5c353
4026532204 /var/run/netns/cni-1a91cdb1-a6f7-e265-9ce5-b7c0eed5c353

Maybe you can make it work in the Pod by:

  1. Configure k8s to ensure permission to access Pods;
  2. Mount the containerd socket to access container information;
  3. Mount host /proc directory to the Pod (or maybe set hostPid=true) to access container net namespace path.

If you are using Kube-OVN and have no access to host PIDs, you can get Pod netns path by querying the OVS interface database:

$ ovs-vsctl --no-heading --data=bare --columns=external_ids find interface external-ids:pod_namespace=kube-system external-ids:pod_name=coredns-78fcd69978-bzj6v
iface-id=coredns-78fcd69978-bzj6v.kube-system ip=10.16.0.16 ovn-installed=true pod_name=coredns-78fcd69978-bzj6v pod_namespace=kube-system pod_netns=/var/run/netns/cni-1a91cdb1-a6f7-e265-9ce5-b7c0eed5c353

Hope it helps.

zhangzujian avatar Feb 14 '22 03:02 zhangzujian

@zhangzujian thank you for your help and you clear answer.

I tried from a pod, for now it has all kind of (bad) privileges I would like to avoid (privileged, hostPID, hostNetwork) since readlink requires some special permission gained with privileged: true. Anyway unfortunately I cannot find a match between the inode numbers of step 3. and 4.

Maybe the issue is that netns folder is mounted within the pod definition?!

Output:

/ # ctr -n k8s.io c info $containerID | grep proc
        "process": {
                "destination": "/proc",
                "type": "proc",
                "source": "proc",
                    "path": "/proc/2055326/ns/ipc"
                    "path": "/proc/2055326/ns/uts"
                    "path": "/proc/2055326/ns/net"
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
               .....
/ #
/ #     readlink -v /proc/$pid/ns/net
net:[4026532917]
/ #     for elem in /host/var/run/netns/*; do ls -i $elem; done
4026533357 /host/var/run/netns/cni-0dd81924-8d2a-13dd-5ba6-f5c06663a574
4026533882 /host/var/run/netns/cni-d41b94aa-51c3-02e8-0c53-b59ba13d5af1
465892218 /host/var/run/netns/cni-d5c57b87-e28d-797d-f869-15df9c9440f7
4026533190 /host/var/run/netns/cni-e07dc311-991a-79f6-053e-dd675602ae92
4026532725 /host/var/run/netns/cni-e0fd0d88-1d82-806c-a186-2c58e347f199

/ #     for elem in /host/var/run/netns/*; do ls -i $elem; done | grep 4026532917       # NOT FOUND

I feel that what I'm trying to achieve is not very reliable and is against the architectural best practices. I would have accepted for now this workaround, hoping to have a better way to achieve it in the future.

It would be much better to delegate this to the CNI tough. In our project we chose OVS among all the networking technologies because of its mirroring ability, for us it is a pillar requirement. It would be very nice to fully support it in kube-ovn. Hope you can evaluate to support it.

If you have any better idea, please tell me. Thanks in advance to you both.

l-rossetti avatar Feb 14 '22 20:02 l-rossetti

@zhangzujian thank you for your help and you clear answer.

I tried from a pod, for now it has all kind of (bad) privileges I would like to avoid (privileged, hostPID, hostNetwork) since readlink requires some special permission gained with privileged: true. Anyway unfortunately I cannot find a match between the inode numbers of step 3. and 4.

Maybe the issue is that netns folder is mounted within the pod definition?!

Output:

/ # ctr -n k8s.io c info $containerID | grep proc
        "process": {
                "destination": "/proc",
                "type": "proc",
                "source": "proc",
                    "path": "/proc/2055326/ns/ipc"
                    "path": "/proc/2055326/ns/uts"
                    "path": "/proc/2055326/ns/net"
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
               .....
/ #
/ #     readlink -v /proc/$pid/ns/net
net:[4026532917]
/ #     for elem in /host/var/run/netns/*; do ls -i $elem; done
4026533357 /host/var/run/netns/cni-0dd81924-8d2a-13dd-5ba6-f5c06663a574
4026533882 /host/var/run/netns/cni-d41b94aa-51c3-02e8-0c53-b59ba13d5af1
465892218 /host/var/run/netns/cni-d5c57b87-e28d-797d-f869-15df9c9440f7
4026533190 /host/var/run/netns/cni-e07dc311-991a-79f6-053e-dd675602ae92
4026532725 /host/var/run/netns/cni-e0fd0d88-1d82-806c-a186-2c58e347f199

/ #     for elem in /host/var/run/netns/*; do ls -i $elem; done | grep 4026532917       # NOT FOUND

I feel that what I'm trying to achieve is not very reliable and is against the architectural best practices. I would have accepted for now this workaround, hoping to have a better way to achieve it in the future.

It would be much better to delegate this to the CNI tough. In our project we chose OVS among all the networking technologies because of its mirroring ability, for us it is a pillar requirement. It would be very nice to fully support it in kube-ovn. Hope you can evaluate to support it.

If you have any better idea, please tell me. Thanks in advance to you both.

Maybe we can write netns path to Pod annotations, such as "ovn.kubernetes.io/netns_path=/path/to/netns". This also helps in debugging scenarios. @oilbeater Any suggestions?

zhangzujian avatar Feb 15 '22 02:02 zhangzujian

We support remote mirroring in the master branch. https://kubeovn.github.io/docs/v1.12.x/en/advance/ovn-remote-port-mirroring/

oilbeater avatar May 03 '23 07:05 oilbeater