opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Support sidecar injection in another pod namespace different from the otel colletor instance.

Open rubenvp8510 opened this issue 4 years ago • 25 comments

If I have an OpenTelemetry collector on sidecar mode in one namespace (eg, test1) but want a pod on other namespace (test2) to be injected, this is not actually possible with the opentelemetry-operator.

Steps to reproduce:

  1. Install opentelemetry-operator
  2. Create namespace test1 kubectl create namespace test1
  3. Create an opentelemetry collector instance mode=sidecar:
kubectl apply -n test1 -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: simplest
spec:
  mode: sidecar
  config: |
    receivers:
      jaeger:
        protocols:
          grpc:
    processors:
      queued_retry:

    exporters:
      logging:

    service:
      pipelines:
        traces:
          receivers: [jaeger]
          processors: [queued_retry]
          exporters: [logging]
EOF
  1. Create a pod in namespace test1
kubectl apply -n test1 -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  labels:
    app: myapp
  annotations:
    sidecar.opentelemetry.io/inject: "true"
spec:
  containers:
  - name: myapp
    image: jaegertracing/vertx-create-span:operator-e2e-tests
    ports:
      - containerPort: 8080
        protocol: TCP
EOF
  1. Verify that sidecar is injected
kubectl get pods -n test1
NAME    READY   STATUS              RESTARTS   AGE
myapp   0/2     ContainerCreating   0          14s
  1. Create namespace test2 kubectl create namespace test2
  2. Create pod on namespace `test2
kubectl apply -n test1 -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  labels:
    app: myapp
  annotations:
    sidecar.opentelemetry.io/inject: "true"
spec:
  containers:
  - name: myapp
    image: jaegertracing/vertx-create-span:operator-e2e-tests
    ports:
      - containerPort: 8080
        protocol: TCP
EOF
```
7)Sidecar is not injected
```
kubectl get pods -n test2
NAME    READY   STATUS    RESTARTS   AGE
myapp   1/1     Running   0          5s
```

rubenvp8510 avatar Feb 15 '21 18:02 rubenvp8510

I would like to take a jab at this. Able to reproduce this - I think we are checking for eligible collector instances in the same namespace as the pod( Namespace mentioned in AdmissionReview Request). I can see below error log in operator-manager

"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test2","name":"myapp","error":"no OpenTelemetry Collector instances available",

If we want to support this we need search of collector instances with given name (passed in annotation) in all namespaces. I guess this would mean a change here -

https://github.com/open-telemetry/opentelemetry-operator/blob/0eec32d96669b674481f05cd1c8e36ab8a0139ac/internal/podinjector/main.go#L169-L171

@jpkrohling what do you think?

bhiravabhatla avatar Mar 03 '21 07:03 bhiravabhatla

There are a few things to consider for this one:

  • I believe we have a WATCH_NAMESPACES env var, telling the operator which namespaces to watch for otelcol CRs. Should we reuse this for this feature, and only consider CRs in those namespaces as eligible to be injected? Or should we have another flag/env var?
  • RBAC - what if the user creating the otelcol CR in the namespace ns1 doesn't have permissions to change pods in n2? What if the service account being used by the CR in the ns1 namespace doesn't have the permissions?

I'm not sure we want to add support for this at this moment, without users coming to us with their concrete use cases.

jpkrohling avatar Mar 03 '21 09:03 jpkrohling

I believe we have a WATCH_NAMESPACES env var, telling the operator which namespaces to watch for otelcol CRs. Should we reuse this for this feature

Yes, But I think this should only solve the problem of providing namespaces to operator to look for collector CRs. Its best not to use the same thing for side car injection. (Seperation of concerns).

Or should we have another flag/env var?

I think right now, user can set the annotation for injection to either true/false or a otel collector name right (correct me If I am wrong). I guess we can use the same flag and look for namespaced name as well (ex: [namespace]/[collectorName]).

RBAC - what if the user creating the otelcol CR in the namespace ns1 doesn't have permissions to change pods in n2? What if the service account being used by the CR in the ns1 namespace doesn't have the permissions?

This would be a genuine concern. I guess we would have to log and error and allow the pod to be created ( as we do now for other corner cases) and document the RBAC requirement.

Actually, If I am not wrong operator (webhook server) is responsible for injection right. If operator service account has a cluster role to update/patch pods - it should work right.

I'm not sure we want to add support for this at this moment, without users coming to us with their concrete use cases.

I think this expection comes from other apps which inject side cars (for example service mesh tooling - Linkerd/Istio). They do support injection across namespaces - Although I am not sure if we have any other operators doing this.

If we decide not add support at this moment. I think we should document the decision & also have way to notify user why sidecar was not injected ( right now, you can only get it from operator logs). Can we publish an event for the pod for the same?

bhiravabhatla avatar Mar 04 '21 03:03 bhiravabhatla

If operator service account has a cluster role to update/patch pods - it should work right.

The operator is certainly able to inject, the concern is whether it should just trust what the user requested, or if it should do the authorization. This is a concern for multi-tenant environments, where each tenant is located on their own namespace.

They do support injection across namespaces

That's a bit different: there's only one Istio control plane responsible for a given number of namespaces, and sidecars are injected to talk to that control plane, meaning that users can't decide which control plane to talk to.

jpkrohling avatar Mar 04 '21 10:03 jpkrohling

If we decide not add support at this moment. I think we should document the decision & also have way to notify user why sidecar was not injected ( right now, you can only get it from operator logs). Can we publish an event for the pod for the same?

I think that's the real action item here. We could either publish an event or add a message to the status object (my preference for this case). We do this for upgrades: https://github.com/open-telemetry/opentelemetry-operator/blob/c1ac1df2bf06cc84265ab6df99bbf37fd313338c/pkg/collector/upgrade/v0_9_0.go#L49

jpkrohling avatar Mar 04 '21 10:03 jpkrohling

Pod status looks more appropriate for this use-case. Can I work on this ? @jpkrohling

bhiravabhatla avatar Mar 04 '21 14:03 bhiravabhatla

Go ahead!

jpkrohling avatar Mar 04 '21 15:03 jpkrohling

@jpkrohling Was away on vacation. Would work on this in this week.

bhiravabhatla avatar Mar 08 '21 03:03 bhiravabhatla

@jpkrohling Pod status has a message field and I tried writing the error message to it. But looks like its being overwritten by k8s controller. I tried adding custom status condition as well, dint work out - overwritten.

It worked in upgrade case - as no other controller except ours mutates/changes the fields of CRD.

I guess we dont have a choice but to create an event for our case. Kindly share your thoughts.

bhiravabhatla avatar Mar 11 '21 05:03 bhiravabhatla

Have you tried using the Status() function from the client? Writing the status object should be done with a special writer, like this:

https://github.com/open-telemetry/opentelemetry-operator/blob/c1ac1df2bf06cc84265ab6df99bbf37fd313338c/pkg/collector/upgrade/upgrade.go#L64

If you did try that, then I'd be fine with creating an event instead.

jpkrohling avatar Mar 11 '21 12:03 jpkrohling

@jpkrohling - Nope dint try that. Will try and let you know

bhiravabhatla avatar Mar 12 '21 05:03 bhiravabhatla

@jpkrohling - Sorry for not responding soon. I was stuck in my day job. I tried Status() writer, dint work out. Its getting overwritten.

Can we publish an event instead?

bhiravabhatla avatar Apr 08 '21 06:04 bhiravabhatla

I think you had a PR for this already, no? I'll give it a try and see if I can make writing to the status object work.

jpkrohling avatar Apr 08 '21 13:04 jpkrohling

Sure. I dont have a PR for this yet. I ll raise one with event approach - if you are able to write the status, we could close my PR

bhiravabhatla avatar Apr 09 '21 10:04 bhiravabhatla

Sounds good!

jpkrohling avatar Apr 09 '21 12:04 jpkrohling

@jpkrohling

I also tried to update the status of the Pod using the Status() function but it is getting overwritten. I also tried to publish the event, the event is getting published (i.e. I can see it in Kubectl get events output ) but it is not showing in the pod describe output. Is it expected behavior?

$kubectl get events

0s          Warning   SidecarInjectionRejected   pod/myapp                                        failed to select an OpenTelemetry Collector instance for the default/myapp pod's sidecar because no OpenTelemetry Collector instances available
0s          Normal    Scheduled                  pod/myapp                                        Successfully assigned default/myapp to docker-desktop
0s          Normal    Pulled                     pod/myapp                                        Container image "jaegertracing/vertx-create-span:operator-e2e-tests" already present on machine
0s          Normal    Created                    pod/myapp                                        Created container myapp
0s          Normal    Started                    pod/myapp                                        Started container myapp

$Kubectl describe pod/myapp

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  83s   default-scheduler  Successfully assigned default/myapp to docker-desktop
  Normal  Pulled     83s   kubelet            Container image "jaegertracing/vertx-create-span:operator-e2e-tests" already present on machine
  Normal  Created    83s   kubelet            Created container myapp
  Normal  Started    82s   kubelet            Started container myapp

avadhut123pisal avatar Mar 05 '22 11:03 avadhut123pisal

@jpkrohling, @pavolloffay, This error still on 0.47.0, then I'll check it.

opentelemetry-operator-controller-manager-5df798fd7d-n88xx manager {"level":"error","ts":1647952137.5868814,"msg":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector)
~# kubectl logs -n test1 my-app-7cfdf6d5bb-wn8sr -c otc-container
Error: failed to get config: cannot unmarshal the configuration: unknown processors type "queued_retry" for "queued_retry" (valid values: [filter batch memory_limiter attributes resource span probabilistic_sampler])
2022/03/22 12:30:48 collector server run finished with error: failed to get config: cannot unmarshal the configuration: unknown processors type "queued_retry" for "queued_retry" (valid values: [filter batch memory_limiter attributes resource span probabilistic_sampler])

yuriolisa avatar Mar 22 '22 12:03 yuriolisa

/assign

yuriolisa avatar Mar 22 '22 12:03 yuriolisa

Hi, the same issue exists for auto-instrumentation, where we get an error message

"msg":"failed to select an OpenTelemetry Instrumentation instance for this pod","namespace":"","name":"","error":"no OpenTelemetry Instrumentation instances available"

when the operator tries to mutate a pod that runs in a namespace where no Instrumentation CRD exists. Reason being that Instrumenation CRDs are only searched in the same namespace as the pod:

https://github.com/open-telemetry/opentelemetry-operator/blob/ecb139c9dc453961f695a186d665666cdd1dfd3c/pkg/instrumentation/podmutator.go#L124-L126

Is this also a use case that will not be supported in the future? Thanks very much

trutty avatar Mar 30 '22 15:03 trutty

Just adding in my 2¢ on this; we're putting together a centralised approach to instrumenting with otel with sidecars and each team individually has control of a namespace or two. Ideally we'd just tell them to add the annotation to inject the sidecar (which we can then manage the configuration of centrally). There's also the fact that istio supports this pattern to be honest!

I'm happy to pick this up and implement the fix but would want some affirmation that this direction is acceptable. My thinking would be to add an optional field into the CR for namespaces istio style

...
namespaces:
 - .  # current namespace; default behaviour
 - *  # all namespaces rbac permitting
 - mynamespace  # explicitly defined
...

naphta avatar May 10 '22 09:05 naphta

I think this is the direction we want to go @naphta . Please do.

dcw329 avatar Jun 01 '22 01:06 dcw329

I agree with @dcw329 - We also want to centrally manage our instrumentation configuration. go for it @naphta.

E-Fallar avatar Jun 01 '22 02:06 E-Fallar

@naphta, could you please check if #889 will attend your proposed change?

yuriolisa avatar Jun 01 '22 07:06 yuriolisa

@pavolloffay, @jpkrohling, I believe we could close this issue based on #889, correct?

yuriolisa avatar Jul 15 '22 10:07 yuriolisa

This is the comment https://github.com/open-telemetry/opentelemetry-operator/pull/889#discussion_r886838314

Loudly speaking the operator could create the config map in the new namespace if needed. Therefore I would keep this issue open.

pavolloffay avatar Jul 18 '22 16:07 pavolloffay

I already added the annotation sidecar.opentelemetry.io/inject: 'true' to the Pod, but it seems the sidecar can only be injected into the pod with the same namespace as the CRD (tested).

Do I need to add any other annotation for the collector sidecar to be injected into the pod? @yuriolisa @pavolloffay @naphta

Or the only current workaround is 1 CRD : 1 namespace?

anh-oolio avatar Nov 28 '22 12:11 anh-oolio