opentelemetry-operator
opentelemetry-operator copied to clipboard
Support sidecar injection in another pod namespace different from the otel colletor instance.
If I have an OpenTelemetry collector on sidecar mode in one namespace (eg, test1) but want a pod on other namespace (test2) to be injected, this is not actually possible with the opentelemetry-operator.
Steps to reproduce:
- Install opentelemetry-operator
- Create namespace test1
kubectl create namespace test1
- Create an opentelemetry collector instance mode=sidecar:
kubectl apply -n test1 -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: simplest
spec:
mode: sidecar
config: |
receivers:
jaeger:
protocols:
grpc:
processors:
queued_retry:
exporters:
logging:
service:
pipelines:
traces:
receivers: [jaeger]
processors: [queued_retry]
exporters: [logging]
EOF
- Create a pod in namespace test1
kubectl apply -n test1 -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: myapp
labels:
app: myapp
annotations:
sidecar.opentelemetry.io/inject: "true"
spec:
containers:
- name: myapp
image: jaegertracing/vertx-create-span:operator-e2e-tests
ports:
- containerPort: 8080
protocol: TCP
EOF
- Verify that sidecar is injected
kubectl get pods -n test1
NAME READY STATUS RESTARTS AGE
myapp 0/2 ContainerCreating 0 14s
- Create namespace test2
kubectl create namespace test2
- Create pod on namespace `test2
kubectl apply -n test1 -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: myapp
labels:
app: myapp
annotations:
sidecar.opentelemetry.io/inject: "true"
spec:
containers:
- name: myapp
image: jaegertracing/vertx-create-span:operator-e2e-tests
ports:
- containerPort: 8080
protocol: TCP
EOF
```
7)Sidecar is not injected
```
kubectl get pods -n test2
NAME READY STATUS RESTARTS AGE
myapp 1/1 Running 0 5s
```
I would like to take a jab at this. Able to reproduce this - I think we are checking for eligible collector instances in the same namespace as the pod( Namespace mentioned in AdmissionReview Request). I can see below error log in operator-manager
"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test2","name":"myapp","error":"no OpenTelemetry Collector instances available",
If we want to support this we need search of collector instances with given name (passed in annotation) in all namespaces. I guess this would mean a change here -
https://github.com/open-telemetry/opentelemetry-operator/blob/0eec32d96669b674481f05cd1c8e36ab8a0139ac/internal/podinjector/main.go#L169-L171
@jpkrohling what do you think?
There are a few things to consider for this one:
- I believe we have a
WATCH_NAMESPACES
env var, telling the operator which namespaces to watch for otelcol CRs. Should we reuse this for this feature, and only consider CRs in those namespaces as eligible to be injected? Or should we have another flag/env var? - RBAC - what if the user creating the otelcol CR in the namespace
ns1
doesn't have permissions to change pods inn2
? What if the service account being used by the CR in thens1
namespace doesn't have the permissions?
I'm not sure we want to add support for this at this moment, without users coming to us with their concrete use cases.
I believe we have a WATCH_NAMESPACES env var, telling the operator which namespaces to watch for otelcol CRs. Should we reuse this for this feature
Yes, But I think this should only solve the problem of providing namespaces to operator to look for collector CRs. Its best not to use the same thing for side car injection. (Seperation of concerns).
Or should we have another flag/env var?
I think right now, user can set the annotation for injection to either true/false or a otel collector name right (correct me If I am wrong). I guess we can use the same flag and look for namespaced name as well (ex: [namespace]/[collectorName]).
RBAC - what if the user creating the otelcol CR in the namespace ns1 doesn't have permissions to change pods in n2? What if the service account being used by the CR in the ns1 namespace doesn't have the permissions?
This would be a genuine concern. I guess we would have to log and error and allow the pod to be created ( as we do now for other corner cases) and document the RBAC requirement.
Actually, If I am not wrong operator (webhook server) is responsible for injection right. If operator service account has a cluster role to update/patch pods - it should work right.
I'm not sure we want to add support for this at this moment, without users coming to us with their concrete use cases.
I think this expection comes from other apps which inject side cars (for example service mesh tooling - Linkerd/Istio). They do support injection across namespaces - Although I am not sure if we have any other operators doing this.
If we decide not add support at this moment. I think we should document the decision & also have way to notify user why sidecar was not injected ( right now, you can only get it from operator logs). Can we publish an event for the pod for the same?
If operator service account has a cluster role to update/patch pods - it should work right.
The operator is certainly able to inject, the concern is whether it should just trust what the user requested, or if it should do the authorization. This is a concern for multi-tenant environments, where each tenant is located on their own namespace.
They do support injection across namespaces
That's a bit different: there's only one Istio control plane responsible for a given number of namespaces, and sidecars are injected to talk to that control plane, meaning that users can't decide which control plane to talk to.
If we decide not add support at this moment. I think we should document the decision & also have way to notify user why sidecar was not injected ( right now, you can only get it from operator logs). Can we publish an event for the pod for the same?
I think that's the real action item here. We could either publish an event or add a message to the status object (my preference for this case). We do this for upgrades: https://github.com/open-telemetry/opentelemetry-operator/blob/c1ac1df2bf06cc84265ab6df99bbf37fd313338c/pkg/collector/upgrade/v0_9_0.go#L49
Pod status looks more appropriate for this use-case. Can I work on this ? @jpkrohling
Go ahead!
@jpkrohling Was away on vacation. Would work on this in this week.
@jpkrohling Pod status has a message field and I tried writing the error message to it. But looks like its being overwritten by k8s controller. I tried adding custom status condition as well, dint work out - overwritten.
It worked in upgrade case - as no other controller except ours mutates/changes the fields of CRD.
I guess we dont have a choice but to create an event for our case. Kindly share your thoughts.
Have you tried using the Status()
function from the client? Writing the status object should be done with a special writer, like this:
https://github.com/open-telemetry/opentelemetry-operator/blob/c1ac1df2bf06cc84265ab6df99bbf37fd313338c/pkg/collector/upgrade/upgrade.go#L64
If you did try that, then I'd be fine with creating an event instead.
@jpkrohling - Nope dint try that. Will try and let you know
@jpkrohling - Sorry for not responding soon. I was stuck in my day job. I tried Status() writer, dint work out. Its getting overwritten.
Can we publish an event instead?
I think you had a PR for this already, no? I'll give it a try and see if I can make writing to the status object work.
Sure. I dont have a PR for this yet. I ll raise one with event approach - if you are able to write the status, we could close my PR
Sounds good!
I also tried to update the status of the Pod using the Status() function but it is getting overwritten. I also tried to publish the event, the event is getting published (i.e. I can see it in Kubectl get events output ) but it is not showing in the pod describe output. Is it expected behavior?
$kubectl get events
0s Warning SidecarInjectionRejected pod/myapp failed to select an OpenTelemetry Collector instance for the default/myapp pod's sidecar because no OpenTelemetry Collector instances available
0s Normal Scheduled pod/myapp Successfully assigned default/myapp to docker-desktop
0s Normal Pulled pod/myapp Container image "jaegertracing/vertx-create-span:operator-e2e-tests" already present on machine
0s Normal Created pod/myapp Created container myapp
0s Normal Started pod/myapp Started container myapp
$Kubectl describe pod/myapp
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 83s default-scheduler Successfully assigned default/myapp to docker-desktop
Normal Pulled 83s kubelet Container image "jaegertracing/vertx-create-span:operator-e2e-tests" already present on machine
Normal Created 83s kubelet Created container myapp
Normal Started 82s kubelet Started container myapp
@jpkrohling, @pavolloffay, This error still on 0.47.0, then I'll check it.
opentelemetry-operator-controller-manager-5df798fd7d-n88xx manager {"level":"error","ts":1647952137.5868814,"msg":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector)
~# kubectl logs -n test1 my-app-7cfdf6d5bb-wn8sr -c otc-container
Error: failed to get config: cannot unmarshal the configuration: unknown processors type "queued_retry" for "queued_retry" (valid values: [filter batch memory_limiter attributes resource span probabilistic_sampler])
2022/03/22 12:30:48 collector server run finished with error: failed to get config: cannot unmarshal the configuration: unknown processors type "queued_retry" for "queued_retry" (valid values: [filter batch memory_limiter attributes resource span probabilistic_sampler])
/assign
Hi, the same issue exists for auto-instrumentation, where we get an error message
"msg":"failed to select an OpenTelemetry Instrumentation instance for this pod","namespace":"","name":"","error":"no OpenTelemetry Instrumentation instances available"
when the operator tries to mutate a pod that runs in a namespace where no Instrumentation
CRD exists. Reason being that Instrumenation
CRDs are only searched in the same namespace as the pod:
https://github.com/open-telemetry/opentelemetry-operator/blob/ecb139c9dc453961f695a186d665666cdd1dfd3c/pkg/instrumentation/podmutator.go#L124-L126
Is this also a use case that will not be supported in the future? Thanks very much
Just adding in my 2¢ on this; we're putting together a centralised approach to instrumenting with otel with sidecars and each team individually has control of a namespace or two. Ideally we'd just tell them to add the annotation to inject the sidecar (which we can then manage the configuration of centrally). There's also the fact that istio supports this pattern to be honest!
I'm happy to pick this up and implement the fix but would want some affirmation that this direction is acceptable. My thinking would be to add an optional field into the CR for namespaces istio style
...
namespaces:
- . # current namespace; default behaviour
- * # all namespaces rbac permitting
- mynamespace # explicitly defined
...
I think this is the direction we want to go @naphta . Please do.
I agree with @dcw329 - We also want to centrally manage our instrumentation configuration. go for it @naphta.
@naphta, could you please check if #889 will attend your proposed change?
@pavolloffay, @jpkrohling, I believe we could close this issue based on #889, correct?
This is the comment https://github.com/open-telemetry/opentelemetry-operator/pull/889#discussion_r886838314
Loudly speaking the operator could create the config map in the new namespace if needed. Therefore I would keep this issue open.
I already added the annotation sidecar.opentelemetry.io/inject: 'true'
to the Pod, but it seems the sidecar can only be injected into the pod with the same namespace as the CRD (tested).
Do I need to add any other annotation for the collector sidecar to be injected into the pod? @yuriolisa @pavolloffay @naphta
Or the only current workaround is 1 CRD : 1 namespace?