opentelemetry-operator
opentelemetry-operator copied to clipboard
Sidecar not set up
Hi, I am now starting to work with this operator and OpenTelemetry. I am trying to set up a collector as sidecar and another one as a Deployment. The Deployment is set up but the sidecar isn't. The operator does not provide any useful logs (not even for the Deployment actually). Here are my YAMLs:
Opentelemetry Operator in namespace operators:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
name: otel-operator
namespace: operators
spec:
replicas: 1
selector:
matchLabels:
app: otel-operator
template:
metadata:
labels:
app: otel-operator
spec:
containers:
- env:
- name: ENABLE_WEBHOOKS
value: "false"
image: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:v0.41.1
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
name: manager
ports:
- containerPort: 8080
name: metrics
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 100m
memory: 64Mi
serviceAccountName: otel-operator
terminationGracePeriodSeconds: 10
OpenTelemetry Collectors and application in namespace tracing-1:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol-deployment
spec:
mode: deployment #Works
image: otel/opentelemetry-collector:0.41.0
config: |
receivers:
jaeger:
protocols:
grpc:
otlp:
protocols:
grpc:
http:
processors:
exporters:
jaeger:
endpoint: "my-jaeger-collector-headless:14250"
service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: []
exporters: [jaeger]
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol-sidecar
spec:
mode: sidecar #Doesn't work
image: otel/opentelemetry-collector:0.41.0
config: |
receivers:
jaeger:
protocols:
grpc:
otlp:
protocols:
grpc:
http:
processors:
exporters:
otlp:
endpoint: "http://otelcol-deployment:4317"
service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: []
exporters: [otlp]
and sample application deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
annotations:
"sidecar.opentelemetry.io/inject": "otelcol-sidecar" # "true" does not work either
spec:
containers:
- name: myapp
image: jaegertracing/vertx-create-span:operator-e2e-tests
ports:
- containerPort: 8080
protocol: TCP
I really don't understand why this doesn't work. Thanks in advance.
Nevermind. I now realized the sidecar is injected within the existing container and not as a sidecar container as in the Jaeger operator. This is quite confusing.
Wait, what? The sidecar is a second container on the same pod. Is this not what you are seeing?
Here's an e2e test that can be used for reference:
https://github.com/open-telemetry/opentelemetry-operator/blob/main/tests/e2e/smoke-sidecar/01-install-app.yaml
And this is the assertion (note that only a few nodes are asserted, the actual YAML is obviously bigger and more complete)
https://github.com/open-telemetry/opentelemetry-operator/blob/main/tests/e2e/smoke-sidecar/01-assert.yaml
Sorry I had a little confusion. This indeed does not set up a sidecar container. The test case you provided does not work either. These are the operator logs:
{"level":"info","ts":1642318385.1218407,"msg":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.41.1","opentelemetry-collector":"otel/opentelemetry-collector:0.41.0","opentelemetry-targetallocator":"quay.io/opentelemetry/target-allocator:0.1.0","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.7.2","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.26.0","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.26b1","build-date":"2021-12-22T17:12:32Z","go-version":"go1.17.5","go-arch":"amd64","go-os":"linux"}
{"level":"info","ts":1642318385.1233158,"logger":"setup","msg":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"info","ts":1642318386.1691027,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1642318386.1823914,"logger":"setup","msg":"starting manager"}
{"level":"info","ts":1642318386.187153,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1642318386.1931977,"logger":"collector-upgrade","msg":"looking for managed instances to upgrade"}
{"level":"info","ts":1642318386.1991522,"logger":"instrumentation-upgrade","msg":"looking for managed Instrumentation instances to upgrade"}
{"level":"info","ts":1642318386.2004423,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.2105188,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.2129228,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.2132735,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.213576,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.2139266,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.2219572,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting EventSource","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","source":"kind source: /, Kind="}
{"level":"info","ts":1642318386.2220583,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting Controller","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector"}
{"level":"info","ts":1642318389.3165417,"logger":"collector-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":1642318391.4126122,"logger":"instrumentation-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":1642318391.4634194,"logger":"controller-runtime.manager.controller.opentelemetrycollector","msg":"Starting workers","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","worker count":1}
This is completely uninformative and I have no idea what's going on. Any help?
Tried playing with this a little bit more. The operator does not seem to detect changes on the OpentelemetryCollector CR at all. I have one running in deployment mode (it did create the deployment) but when I change the configuration to include an additional exporter, for example, no changes are made (at least that's what I understand from the collector logs, I can't connect to the collector to debug it as there is no /bin/sh in the image, but there don't seem to be any changes). When I delete and re-create the collector, it does seem to change the configuration. The operator logs are the same.
Are you able to provide us with your concrete steps and commands to reproduce this using minikube? Given that the test runs on every PR, I'm confident the feature is working, but you might be experiencing this in a situation we are not testing.
Simply applied all the YAMLs above with kubectl. rbac taken from rbac directory, ClusterRoleBinding binds to the correct ServiceAccount in the operators namespace
Any news on that? Still not working
same issue here. my steps:
- install cert-manager (
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml
) - install operator (
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
) - deploy service (
kubectl apply -f deployment.yaml
)
I have used the deployment from the e2e test:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar-for-my-app
spec:
mode: sidecar
config: |
receivers:
jaeger:
protocols:
grpc:
processors:
exporters:
logging:
service:
pipelines:
traces:
receivers: [jaeger]
processors: []
exporters: [logging]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment-with-sidecar
spec:
selector:
matchLabels:
app: my-pod-with-sidecar
replicas: 1
template:
metadata:
labels:
app: my-pod-with-sidecar
annotations:
sidecar.opentelemetry.io/inject: "true"
spec:
containers:
- name: myapp
image: ealen/echo-server
(base) ➜ poc kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-6dd9658548-5kwd9 1/1 Running 9 (5m54s ago) 8d
cert-manager cert-manager-cainjector-5987875fc7-dcnjf 1/1 Running 16 (5m54s ago) 8d
cert-manager cert-manager-webhook-7b4c5f579b-x8l74 1/1 Running 16 (5m ago) 8d
default my-deployment-with-sidecar-fd49d7558-rb494 2/2 Running 0 3m37s
kube-system coredns-6d4b75cb6d-d9np5 1/1 Running 10 (5m54s ago) 19d
kube-system coredns-6d4b75cb6d-nqhl2 1/1 Running 10 (5m54s ago) 19d
kube-system etcd-docker-desktop 1/1 Running 10 (5m54s ago) 19d
kube-system kube-apiserver-docker-desktop 1/1 Running 10 (5m54s ago) 19d
kube-system kube-controller-manager-docker-desktop 1/1 Running 10 (5m54s ago) 19d
kube-system kube-proxy-wc9zm 1/1 Running 10 (5m54s ago) 19d
kube-system kube-scheduler-docker-desktop 1/1 Running 10 (5m54s ago) 19d
kube-system storage-provisioner 1/1 Running 17 (5m3s ago) 19d
kube-system vpnkit-controller 1/1 Running 383 (5m ago) 19d
kubernetes-dashboard dashboard-metrics-scraper-7bfdf779ff-crp8j 1/1 Running 10 (5m54s ago) 19d
kubernetes-dashboard kubernetes-dashboard-6cdd697d84-r7bnk 1/1 Running 11 (5m54s ago) 19d
monitoring grafana-59b48c5bb5-j4j64 1/1 Running 10 (5m54s ago) 19d
opentelemetry-operator-system opentelemetry-operator-controller-manager-79d8468dcf-zxkrf 2/2 Running 3 (5m ago) 26m
otel jaeger-all-in-one-ff896dfc7-7wtlq 1/1 Running 10 (5m54s ago) 12d
otel poc-service-a-68b78d7b76-bfkn5 1/1 Running 1 (5m54s ago) 56m
otel prometheus-6dcdf8dd45-7wwp9 1/1 Running 10 (5m54s ago) 12d
otel zipkin-all-in-one-9668844cb-5ztm5 1/1 Running 10 (5m54s ago) 12d
@keu, could you please confirm the Otel Operator version you have installed?
@keu, could you please confirm the Otel Operator version you have installed?
@yuriolisa thank you for the quick response, how can I check it? I just did
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
Same here using 0.54.0. To be more accurate, If I manually kill the pod, the sidecar does get injected after re-creation...
The operator controller says :
error failed to select an OpenTelemetry Collector instance for this pod's sidecar
no OpenTelemetry Collector instances available
github.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle
/workspace/internal/webhookhandler/webhookhandler.go:92
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:146
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:99
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:40
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2084
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:101
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2084
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:76
net/http.HandlerFunc.ServeHTTP
/usr/local/go/src/net/http/server.go:2084
net/http.(*ServeMux).ServeHTTP
/usr/local/go/src/net/http/server.go:2462
net/http.serverHandler.ServeHTTP
/usr/local/go/src/net/http/server.go:2916
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:1966
The operator does not seem to detect changes on the OpentelemetryCollector CR at all.
@edenkoveshi from my understanding thats intended. There is a request for an auto-update feature https://github.com/open-telemetry/opentelemetry-operator/issues/553
same issue here.
@keu could you explain what issue you face? created a new kind cluster and followed your instructions. I end up with default my-deployment-with-sidecar-fd49d7558-rb494 2/2 Running 0 3m37s
.
It contains:
Containers:
myapp:
Container ID: containerd://a886e3f4498e70b31d8a3d99ea515864ac7ceec58fa70e4039e8d2e8886910ad
Image: ealen/echo-server
Image ID: docker.io/ealen/echo-server@sha256:bda7884be159b8b5a06a6eb9c7d066d4f9667bed1c68e4ab1564c5f48079b46e
...
otc-container:
Container ID: containerd://e5e1747001f9c4b1fad931ceb2a17a3a0df2c9e0fd7285c63b5315ec61a53067
Image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector:0.56.0
Image ID: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector@sha256:3fe065c828e718b464af35d79cf9017d0f4ca3c0e6f444c515d0ff36c020d41c
Args:
--config=/conf/collector.yaml
...
Then i changed my configuration kubectl edit opentelemetrycollectors.opentelemetry.io sidecar-for-my-app
.
Operator logs say nothing special:
{"level":"info","ts":1659088512.2854543,"logger":"opentelemetrycollector-resource","msg":"default","name":"sidecar-for-my-app"}
{"level":"info","ts":1659088512.287057,"logger":"opentelemetrycollector-resource","msg":"validate update","name":"sidecar-for-my-app"}
Next i applied the changes using kubectl rollout restart deployment my-deployment-with-sidecar
.
Same here using 0.54.0. To be more accurate, If I manually kill the pod, the sidecar does get injected after re-creation...
@jtama could you confirm that everything was up and running? since i tried to do the same using kubectl delete pod
but it didnt produce a panic.
Actually, we found out the annotated deployment was created before the sidecar CRD was. We added an hint to help helm ordering the deployment and now everything works fine.
As @jtama confirmed, we can close this issue.