network-policy-api icon indicating copy to clipboard operation
network-policy-api copied to clipboard

[Policy Assistant] Add support for k8s native workload traffic

Open gabrielggg opened this issue 10 months ago • 12 comments

Hi @huntergregory as we discussed on #168 , this is to solve #220 , i've implemented this so that a user can use both types of traffic (native, non native) in a same traffic.json file (please refer to https://github.com/gabrielggg/network-policy-api/blob/main/cmd/policy-assistant/examples/traffic.json file to see some examples). Now the trafficPeer is supporting both. I hope you like this and give some feedback.

For example, for a traffic input like this one :

{ "Source": { "Internal": { "Workload": {"daemonset": "fluentd-elasticsearch"}, "Namespace": "kube-system" } }, "Destination": { "Internal": { "Workload": {"deployment": "nginx-deployment2"}, "Namespace": "default" } }, "Protocol": "TCP", "ResolvedPort": 80, "ResolvedPortName": "serve-80-tcp" },{ "Source": { "Internal": { "Workload": {"daemonset": "fluentd-elasticsearch"}, "Namespace": "kube-system" } }, "Destination": { "Internal": { "PodLabels": {"pod": "b"}, "NamespaceLabels": {"ns": "y"}, "Namespace": "y" }, "IP": "192.168.1.100" }, "Protocol": "TCP", "ResolvedPort": 80, "ResolvedPortName": "serve-80-tcp" }

You get this output (considering that both the deployment and the daemonset have 2 replicas):

image image image

Please check this out!

gabrielggg avatar Apr 26 '24 06:04 gabrielggg

Hi @gabrielggg. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 26 '24 06:04 k8s-ci-robot

Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name Link
Latest commit 8ec98852977ed2432c7b71359ba5f470776e57ed
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/6671fe846fe4320008f77644
Deploy Preview https://deploy-preview-227--kubernetes-sigs-network-policy-api.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Apr 26 '24 06:04 netlify[bot]

/ok-to-test

mattfenwick avatar May 01 '24 13:05 mattfenwick

/retest

gabrielggg avatar May 02 '24 16:05 gabrielggg

Hey @gabrielggg I accidentally published my review before I finished writing. When you can, please take a look at the edited text above and let me know how this sounds

huntergregory avatar May 06 '24 22:05 huntergregory

Hey @gabrielggg, taking a second look I’d approve your original idea, and my only suggestion is adding the pods field to InternalPeer. Let’s keep everything internal (as in internal to cluster). So like:

type TrafficPeer struct {
	Internal *InternalPeer
        // IP external to cluster
	IP          string
}

// Internal to cluster
type InternalPeer struct {
        // optional: if set, will override remaining values with information from cluster
        workload string

	PodLabels       map[string]string
	NamespaceLabels map[string]string
	Namespace       string
        // optional
        Pods      []*PodNetworking
}

Still would prefer to save the traffic command logic for a follow-up PR if you don’t mind? So this PR would just modify data structures and define a function that populates InternalPeer based on workload name.

huntergregory avatar May 09 '24 01:05 huntergregory

/retest

gabrielggg avatar May 09 '24 22:05 gabrielggg

Hey @huntergregory , thanks a lot for your review and guidance on this PR, it helped a lot!

Well, i implemented the new structs and created a func to Translate the TrafficPeer, as you suggested , i'm not going to touch the analyze command on this PR, i think it makes sense to treat that separately.

After some testing of the translate function i'm getting the following object.

For this traffic example (nginx deployment workload has 2 replicas):

[
    {
      "Source": {
        "Internal": {
            "Workload": "default/deployment/nginx-deployment2"
          }
        
      },
      "Destination": {
        "IP": "8.8.8.8"
      },
      "Protocol": "TCP",
      "ResolvedPort": 80,
      "ResolvedPortName": "serve-80-tcp"
    }   
]

Then after doing something like this:

b, err := json.Marshal(traffic.Source.Translate())
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(b))	

I'm getting this translated TrafficPeer object returned:

{
  "Internal": {
    "Workload": "default/deployment/nginx-deployment2",
    "PodLabels": {
      "app": "nginx",
      "app2": "tmp",
      "pod-template-hash": "5c496748cb"
    },
    "NamespaceLabels": {
      "kubernetes.io/metadata.name": "default",
      "labelname2": "value2"
    },
    "Namespace": "default",
    "Pods": [
      {
        "IP": "10.224.0.41",
        "IsHostNetworking": false,
        "NodeLabels": null
      },
      {
        "IP": "10.224.0.234",
        "IsHostNetworking": false,
        "NodeLabels": null
      }
    ]
  },
  "IP": ""
}

Please let me know what do you think about this!

gabrielggg avatar May 09 '24 22:05 gabrielggg

By the way, the next capability I had in mind for Policy Assistant is a continuation of this work. For #221, I think we should get all Deployments and DaemonSets in the cluster, then convert them to TrafficPeers. Maybe later we could support more than Deployment and DaemonSet.

Would you be able to help with this small task too since it is similar? Maybe we can reuse code from these other functions? I would be ok with doing that work in this PR too

huntergregory avatar May 13 '24 05:05 huntergregory

Hey @huntergregory , thanks a lot for the review and comments, i think i solved all the comments you made on the code (including support for pods and replicaset), please check it out and tell me what you think; I will also upload the input/output json files from my tests and the yamls from the workloads used for testing as you suggested.

Regarding your last comment, i will be working on that on this PR next.

gabrielggg avatar May 14 '24 18:05 gabrielggg

Hey @gabrielggg, were you still thinking to convert all Deployments/DaemonSets to TrafficPeers in this PR?

huntergregory avatar May 29 '24 21:05 huntergregory

Hi, @huntergregory , yes i'm working on it.

gabrielggg avatar May 31 '24 19:05 gabrielggg

Hey @huntergregory , sorry for the delay, it's done, i created 2 functions (1 for mapping deployments and 1 for mapping daemonsets; Maybe we can support othe workload types also), both are using the previously create Translate func and return a slice containing all the generated TrafficPeers, please check it out!

gabrielggg avatar Jun 05 '24 04:06 gabrielggg

Hey @huntergregory , i just added the support for the other workload types (statefulset, replicaset, pod(this one converts all the pods on the cluster to trafficpeers)), so now we are supporting all the workload types. I've created a separated func for each workload type. But maybe it will be better to put everything on 1 func to reuse the code. Please tell me what you think about it.

gabrielggg avatar Jun 06 '24 17:06 gabrielggg

Hey @huntergregory, thanks a lot for the review, i'm working on the observations but i have one doubt regarding this comment :

I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance.

gabrielggg avatar Jun 08 '24 03:06 gabrielggg

I was envisioning a function that gets all Pods in the cluster and creates only one TrafficPeer per Pod. Would you mind modifying your code to support this? Maybe handle all Deployments and DaemonSets first, then remaining ReplicaSets and StatefulSets, and finally remaining Pods.

The PodsToTrafficPeers() function does exactly that right now, it create a TrafficPeer per pod on the cluster. Maybe i'm not understanding what do you mean, if so, can you please elaborate a little bit. Thanks a los in advance.

Hey sorry @gabrielggg, I expressed this incorrectly. I mean for each Pod to be associated with exactly one TrafficPeer. For example, all Pods in a Deployment should be associated with the same TrafficPeer (with the Deployment workload type). Those same Pods should not be associated with a TrafficPeer with the ReplicaSet or Pod workload type. So there should be less TrafficPeers than Pods. The function logic would be something like:

  1. Create one TrafficPeer for each Deployment, and consider all Pods in the Deployment as "handled".
  2. Similar for DaemonSets.
  3. Get all ReplicaSets. Ignore the ReplicaSet if it's associated with a Deployment (equally, ignore Pods that have been "handled" already).
  4. Similar for StatefulSets.
  5. For remaining Pods, only create a TrafficPeer if a Pod has not been "handled".

huntergregory avatar Jun 08 '24 23:06 huntergregory

thanks for the clarification @huntergregory , I think that makes sense. I will be working on it.

gabrielggg avatar Jun 09 '24 04:06 gabrielggg

hey @huntergregory , i managed to get each Pod to be associated with exactly one TrafficPeer (you can see those inputs/outputs on the test files). Please check that out, also added all the test files used for testing including workload yamls and json outputs of each function, please check it out to and tell me what you think.

gabrielggg avatar Jun 17 '24 21:06 gabrielggg

hey @huntergregory , thanks for the review. I fixed the edge case and removed the log. Please check it out.

gabrielggg avatar Jun 18 '24 20:06 gabrielggg

/lgtm /approve

huntergregory avatar Jul 05 '24 19:07 huntergregory

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabrielggg, huntergregory

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 05 '24 19:07 k8s-ci-robot