helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[tempo] Added Network Policy capability

Open Sheikh-Abubaker opened this issue 1 year ago • 3 comments
trafficstars

Fixes #2890

Sheikh-Abubaker avatar Jan 21 '24 20:01 Sheikh-Abubaker

@zanhsieh please checkout I've done the required changes.

Sheikh-Abubaker avatar Jan 22 '24 09:01 Sheikh-Abubaker

@zanhsieh I think I found some missing entries that should be in values.yaml but are not, please correct me if I am wrong, this code section below in charts/grafana/templates/networkpolicy.yaml is defined to include any labels and annotations defined in values.yaml right ? but when I searched through values.yaml I was unable to find any entries for labels though I can see entry for annotations but it is commented out, what do you think about this ?

    {{- with .Values.labels }}
    {{- toYaml . | nindent 4 }}
    {{- end }}
  {{- with .Values.annotations }}
  annotations:
    {{- toYaml . | nindent 4 }}
  {{- end }}

Sheikh-Abubaker avatar Jan 22 '24 11:01 Sheikh-Abubaker

@Sheikh-Abubaker That's why the code go with {{ with }}.

zanhsieh avatar Jan 24 '24 07:01 zanhsieh

Hi, is there anything else pending for this ? Or just an additional review required ? Thanks a lot @Sheikh-Abubaker.

hkailantzis avatar Mar 12 '24 15:03 hkailantzis

@hkailantzis I've added this resource from referring to : https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/networkpolicy.yaml

I have a doubt in this part of the code:

egress:
    {{- if not .Values.networkPolicy.egress.blockDNSResolution }}
    - ports:
        - port: 53
          protocol: UDP
    {{- end }}

As for the grafana networkpolicy the port is defined as 53, which port should be assigned for tempo networkpolicy ? Please correct me if I am wrong but as per my understanding since it a general K8s resource then we should utilize the same port i.e 53 ?

Sheikh-Abubaker avatar Mar 13 '24 10:03 Sheikh-Abubaker

@Sheikh-Abubaker , I'm really not sure about this tbh. I suspect is something general on networking, like when blocking DNS resolution which runs on port 53. is related with Firewalls etc. ie: https://library.netapp.com/ecmdocs/ECMP1155586/html/GUID-D052D155-EF55-4D19-A70F-B9A8FA86A6D3.html#:~:text=The%20Domain%20Name%20System%20(DNS,name%20and%20IP%20address%20lookups.

So, is basically if someone wishes to Block DNS resolution, then the template just says: if blockDNSResolution=true, then do NOT allow outgoing traffic via UDP port 53. So is an generic additional feature, I guess you could include it also here.

hkailantzis avatar Mar 13 '24 17:03 hkailantzis

So, is basically if someone wishes to Block DNS resolution, then the template just says: if blockDNSResolution=true, then do NOT allow outgoing traffic via UDP port 53. So is an generic additional feature, I guess you could include it also here.

@hkailantzis thanks for the additional info, I hope this gets merged after the awaited additional review.

Sheikh-Abubaker avatar Mar 14 '24 09:03 Sheikh-Abubaker

@hkailantzis can you please help me in testing the network policy, I'm facing some issues:

Here are the insights from my system:

$ helm install my-tempo ./tempo-1.8.0.tgz
NAME: my-tempo
LAST DEPLOYED: Wed Mar 27 03:58:51 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl get all
NAME             READY   STATUS    RESTARTS   AGE
pod/my-tempo-0   1/1     Running   0          18s

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)
            AGE
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP
            36s
service/my-tempo     ClusterIP   10.102.51.181   <none>        3100/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP   18s

NAME                        READY   AGE
statefulset.apps/my-tempo   1/1     18s
$ kubectl get networkpolicy
NAME       POD-SELECTOR                                                       AGE
my-tempo   app.kubernetes.io/instance=my-tempo,app.kubernetes.io/name=tempo   30s

$ kubectl describe networkpolicy my-tempo
Name:         my-tempo
Namespace:    default
Created on:   2024-03-27 03:58:51 +0530 IST
Labels:       app.kubernetes.io/instance=my-tempo
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=tempo
              app.kubernetes.io/version=2.3.1
              helm.sh/chart=tempo-1.8.0
Annotations:  meta.helm.sh/release-name: my-tempo
              meta.helm.sh/release-namespace: default
Spec:
  PodSelector:     app.kubernetes.io/instance=my-tempo,app.kubernetes.io/name=tempo
  Allowing ingress traffic:
    To Port: <nil>/TCP
    From: <any> (traffic not restricted by source)
  Not affecting egress traffic
  Policy Types: Ingress
$ kubectl describe service my-tempo
Name:              my-tempo
Namespace:         default
Labels:            app.kubernetes.io/instance=my-tempo
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=tempo
                   app.kubernetes.io/version=2.3.1
                   helm.sh/chart=tempo-1.8.0
Annotations:       meta.helm.sh/release-name: my-tempo
                   meta.helm.sh/release-namespace: default
Selector:          app.kubernetes.io/instance=my-tempo,app.kubernetes.io/name=tempo
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.102.51.181
IPs:               10.102.51.181
Port:              tempo-prom-metrics  3100/TCP
TargetPort:        3100/TCP
Endpoints:         10.244.0.3:3100
Port:              tempo-jaeger-thrift-compact  6831/UDP
TargetPort:        6831/UDP
Endpoints:         10.244.0.3:6831
Port:              tempo-jaeger-thrift-binary  6832/UDP
TargetPort:        6832/UDP
Endpoints:         10.244.0.3:6832
Port:              tempo-jaeger-thrift-http  14268/TCP
TargetPort:        14268/TCP
Endpoints:         10.244.0.3:14268
Port:              grpc-tempo-jaeger  14250/TCP
TargetPort:        14250/TCP
Endpoints:         10.244.0.3:14250
Port:              tempo-zipkin  9411/TCP
TargetPort:        9411/TCP
Endpoints:         10.244.0.3:9411
Port:              tempo-otlp-legacy  55680/TCP
TargetPort:        55680/TCP
Endpoints:         10.244.0.3:55680
Port:              tempo-otlp-http-legacy  55681/TCP
TargetPort:        4318/TCP
Endpoints:         10.244.0.3:4318
Port:              grpc-tempo-otlp  4317/TCP
TargetPort:        4317/TCP
Endpoints:         10.244.0.3:4317
Port:              tempo-otlp-http  4318/TCP
TargetPort:        4318/TCP
Endpoints:         10.244.0.3:4318
Port:              tempo-opencensus  55678/TCP
TargetPort:        55678/TCP
Endpoints:         10.244.0.3:55678
Session Affinity:  None
Events:            <none>
$ kubectl run busybox --rm -ti --namespace=default --image=busybox:1.28 --labels="app.kubernetes.io/instance=my-tempo,app.kubernetes.io/name=tempo" -- /bin/sh
If you don't see a command prompt, try pressing enter.
/ # wget -O - http://my-tempo:3100
Connecting to my-tempo:3100 (10.102.51.181:3100)
wget: can't connect to remote host (10.102.51.181): Connection refused

I've been trying to figure out this Connection refused error though I'm trying to connect using valid labels and namespace can you please look into this any suggestions would help!

Sheikh-Abubaker avatar Mar 26 '24 23:03 Sheikh-Abubaker

Tried again with different method this time

$ helm install my-tempo ./tempo-1.8.0.tgz
NAME: my-tempo
LAST DEPLOYED: Fri Apr  5 01:02:10 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

$ kubectl get all

NAME                         READY   STATUS    RESTARTS   AGE
pod/my-tempo-0               1/1     Running   0          6s
pod/nginx-7854ff8877-4jdnr   1/1     Running   0          32m

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)
             AGE
service/kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP
             24h
service/my-tempo     ClusterIP   10.110.185.197   <none>        3100/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP   6s
service/nginx        ClusterIP   10.111.118.2     <none>        80/TCP
             32m

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   1/1     1            1           32m

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-7854ff8877   1         1         1       32m

NAME                        READY   AGE
statefulset.apps/my-tempo   1/1     6s

Trying connecting to my-tempo service from nginx pod without appropriate labes

$ kubectl get pods -n default --show-labels
NAME                     READY   STATUS    RESTARTS   AGE    LABELS
my-tempo-0               1/1     Running   0          7m7s   app.kubernetes.io/instance=my-tempo,app.kubernetes.io/name=tempo,apps.kubernetes.io/pod-index=0,controller-revision-hash=my-tempo-958864bc7,statefulset.kubernetes.io/pod-name=my-tempo-0
nginx-7854ff8877-4jdnr   1/1     Running   0          39m    app=nginx,pod-template-hash=7854ff8877


$ kubectl exec nginx-7854ff8877-4jdnr -n default -- nc -vz my-tempo 3100
nc: connect to my-tempo (10.110.185.197) port 3100 (tcp) failed: Connection timed out
command terminated with exit code 1

Connecting to my-tempo service again from nginx pod but this time with appropriate labels

$ kubectl describe pod nginx-7854ff8877-4jdnr
Name:             nginx-7854ff8877-4jdnr
Namespace:        default
Priority:         0
Service Account:  default
Node:             multinode-demo-m02/192.168.58.3
Start Time:       Fri, 05 Apr 2024 00:29:41 +0530
Labels:           app=nginx
                  app.kubernetes.io/instance=my-tempo
                  app.kubernetes.io/name=tempo
                  app.kubernetes.io/version=2.3.1
                  helm.sh/chart=tempo-1.8.0
                  my-tempo-client=true
                  pod-template-hash=7854ff8877
                  role=read
Annotations:      cni.projectcalico.org/containerID: 8c48ed2c4c4656d46334c7938f43c0b45520ed06f2d209b832d1dcace821ae4f
                  cni.projectcalico.org/podIP: 10.244.239.11/32
                  cni.projectcalico.org/podIPs: 10.244.239.11/32
Status:           Running
IP:               10.244.239.11
IPs:
  IP:           10.244.239.11
Controlled By:  ReplicaSet/nginx-7854ff8877
Containers:
  nginx:
    Container ID:   docker://4461cd0b346eec18d0b05e3bea8e932147c1850eb45674d1def0282802246d41
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:6db391d1c0cfb30588ba0bf72ea999404f2764febf0f1f196acd5867ac7efa7e
    
$ kubectl exec nginx-7854ff8877-4jdnr -n default -- nc -vz my-tempo 3100
Connection to my-tempo (10.110.185.197) 3100 port [tcp/*] succeeded!

This PR is all set to be merged now.

Sheikh-Abubaker avatar Apr 04 '24 20:04 Sheikh-Abubaker

@hkailantzis I wanted to ask you one thing in values.yaml networkPolicy.allowExternal is set to true, in this case it allows pods to connect to service even if the pod doesn't have required labels requiring only correct destination port, but if it is set to false in that case it only allows pods with appropriate labels to connect to service, what do you suggest should networkPolicy.allowExternal default to true or false ??

Sheikh-Abubaker avatar Apr 04 '24 20:04 Sheikh-Abubaker

hi @Sheikh-Abubaker , I would suggest to set it true, as the main grafana chart is setting it. Thanks! Do you need still help with testing this PR ?

hkailantzis avatar Apr 06 '24 09:04 hkailantzis

hi @Sheikh-Abubaker , I would suggest to set it true, as the main grafana chart is setting it. |

alright 👍.

Thanks! Do you need still help with testing this PR ?

Thanks for asking but I've already tested it with another method and it works fine as expected, you can checkout the above updated comments where I tested it using nginx pod.

As of now this PR is all set to be merged.

Sheikh-Abubaker avatar Apr 06 '24 11:04 Sheikh-Abubaker

@zalegrala can you please approve this PR.

Sheikh-Abubaker avatar Jun 03 '24 14:06 Sheikh-Abubaker

I've no way to validate this change and perhaps seems quite site-dependent. Would a generic "extraObjects" allow for the change you're after, similar to what is currently in tempo-distributed? https://github.com/grafana/helm-charts/blob/main/charts/tempo-distributed/values.yaml#L2150C1-L2150C13

zalegrala avatar Jun 03 '24 20:06 zalegrala

Would a generic "extraObjects" allow for the change you're after, similar to what is currently in tempo-distributed? https://github.com/grafana/helm-charts/blob/main/charts/tempo-distributed/values.yaml#L2150C1-L2150C13

Yes that would work but the original issue #2890 was about defining a Network Policy to be able to accept incoming traffic from other components, so I went on with defining Network Policy to get rid of manual definition that the user stated about.

Sheikh-Abubaker avatar Jun 04 '24 10:06 Sheikh-Abubaker

From my perspective, I think its more important that we have extension points so that site-specific requirements can be made without needing to maintain all those specifics in the chart. Forgive my ignorance around the NetworkPolicy object, since I've not used it. If we create that extension point, then it seems that we empower users to extend the chart in ways that are useful to them, and we can supplement with docs or other ways to demonstrate how some of these specifics were achieved without holding the maintenance burden on the chart. I see the request was also made for the tempo-distributed chart, but considering the extraObjects support there, it would be good to know if this works for their use case.

zalegrala avatar Jun 04 '24 14:06 zalegrala

Forgive my ignorance around the NetworkPolicy object, since I've not used it.

@zalegrala No worries we're all here to learn and giveback, let me break it down for you in very simple terms, we have a set of rules(policies) using which we control the traffic flow in/out within our network, you can go through the conversations of this PR for a detailed insight on how it works.

but considering the extraObjects support there, it would be good to know if this works for their use case.

sounds good, what do you suggest now ? should I keep this or change it to support extraObjects I am open to both, let me know your views.

Sheikh-Abubaker avatar Jun 04 '24 15:06 Sheikh-Abubaker

I think let's learn if the extraObjects support would solve for the issue at hand. If so, let's line up the charts to use that approach. I'm happy to continue the discussion as well, perhaps others feel strongly. My view is probably rooted more in a desire to keep this particular chart as simple as we can keep it.

zalegrala avatar Jun 04 '24 15:06 zalegrala

Sorry, I see now that I'm a little late to the conversation as well.

zalegrala avatar Jun 04 '24 15:06 zalegrala

I think let's learn if the extraObjects support would solve for the issue at hand. If so, let's line up the charts to use that approach.

No worries I can proceed with extraObjects part too.

My view is probably rooted more in a desire to keep this particular chart as simple as we can keep it.

It would be simple after Network Policy too because we are not enabling it by default, but if user wants to enable it they can do it on their own as per their requirement just like: https://github.com/grafana/helm-charts/blob/1b48dcc6a74adae15fc3e183858025acc353cc78/charts/grafana/values.yaml#L1240C1-L1243C17

Sheikh-Abubaker avatar Jun 04 '24 15:06 Sheikh-Abubaker

I think let's learn if the extraObjects support would solve for the issue at hand. If so, let's line up the charts to use that approach. I'm happy to continue the discussion as well, perhaps others feel strongly. My view is probably rooted more in a desire to keep this particular chart as simple as we can keep it.

@zalegrala I really fail to see, why Tempo should differ on this, when e.g. grafana, Loki etc. are using NetworkPolicy specific templates already...isn't better to unify the approach, for consistency, common pattern etc ? or perhaps I'm missing something... ? :-)

and as @Sheikh-Abubaker stated, is quite the pain for users to maintain such template on their own...and is kind of clean since is up to a boolean to enable or not...and set to false by default. Aslo, NP, is not some exotic resource, is quite basic k8s resource, used in security context, similar to RBAC, is just network traffic specific. In this case, is may as well be specified and let user enabling or not. And in addition, if done also here, it could make sense to also apply in the Tempo-distributed also (in a separate PR).

https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/templates/networkpolicy.yaml

https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/networkpolicy.yaml https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/image-renderer-network-policy.yaml

hkailantzis avatar Jun 06 '24 00:06 hkailantzis

Hi @hkailantzis, thank you for leaving your thoughts. I agree that it would be nice to unify certain portions of the charts where we can. Perhaps this is one of those areas. Many of the charts in this repo are entirely community maintained, and so there isn't a unified approach here unless it comes from the community, which arguably hasn't, and Grafana hasn't done much to push standards here as well. I think testing the charts could be a good step, since we have no diff to review in this PR, etc. Those kinds of standards would be good to push on in my oppinion.

My questions around the NetworkPolicy object made no mention of an exotic resource, just one that I'd not used and have no way to validate. As to the other charts containing the this resource already, respectfully, I was not directly pinged to review those PRs.

The part that I struggle to understand is why its better to invent a new yaml struct rather than providing users direct access to render full kubernetes objects in the chart when they aren't specific to Tempo. In the extraObjects case discussed above, it seems to me that there would be no additional template that needed to be managed on the user side, and users would be able to put exactly the details about the NetworkPolicy in the values. This would mean we don't have to wonder about why we are including an option for blocking egress DNS traffic, etc. This goes to the point above I was mentioning creating the extension points, rather than site specifics, though I agree the single boolean covers us in this case. If including the NetworkPolicy here is the right move, I have no intention of standing in the way, but I would like to understand this point.

As someone who gets regularly pinged for review on the charts of which I am not a user, its good for me to have a working understanding of the needs, so thank you for your patience while I work to understand the challenges users face.

zalegrala avatar Jun 07 '24 20:06 zalegrala

Sorry for the delay, I've been traveling.

No worries it's alright 😄!!

Thanks for the contributions @Sheikh-Abubaker. Keep them coming.

You're welcome and thanks for your support too!! I'd definitely keep it coming 🎉!

Sheikh-Abubaker avatar Jun 17 '24 14:06 Sheikh-Abubaker

Hey @zalegrala the original issue #2890 also mentioned about having this feature enabled for tempo-distributed, so should I raise a similar PR for tempo-distributed too ??

Sheikh-Abubaker avatar Jun 17 '24 14:06 Sheikh-Abubaker

If its valuable to someone, sure feel free. If we are making changes that nobody is using then :shrug: I guess.

zalegrala avatar Jun 20 '24 17:06 zalegrala

If its valuable to someone, sure feel free. If we are making changes that nobody is using then 🤷 I guess.

Yes it is, as the original issue #2890 wanted this feature for tempo-distributed too!

Sheikh-Abubaker avatar Jun 21 '24 12:06 Sheikh-Abubaker

hi @Sheikh-Abubaker , thanks for this!. It works like a charm, with the 'client' label etc. Sorry for the delay of the reply. I kinda noticed that 'client' label is basically not taken into account in the cross-namespace selector case. (its '-', so its now an 'OR' operation. e.g:

- podSelector:
    matchLabels:
      tempo-sre-client: 'true'
- namespaceSelector:
    matchLabels:
      env: test

when for cross-namespace, the 'client' label should be also present right ? otherwise any pods from another namespace can access Tempo. e.g: notice the lack of '-' in the second podSelector, which denotes and AND operation. (YAML joy , I know...)

- podSelector:
    matchLabels:
      tempo-sre-client: 'true'
- namespaceSelector:
    matchLabels:
      env: test
  podSelector:
    matchLabels:
      tempo-sre-client: 'true'

Explanation with example here: https://kubernetes.io/docs/concepts/services-networking/network-policies/#behavior-of-to-and-from-selectors

hkailantzis avatar Jun 25 '24 07:06 hkailantzis

Hey there @hkailantzis I have addressed your comment in the PR #3203, checkout and let me know if it is something you were looking for.

Sheikh-Abubaker avatar Jul 03 '24 10:07 Sheikh-Abubaker

@Sheikh-Abubaker looks good! thanks!

hkailantzis avatar Jul 04 '24 09:07 hkailantzis