opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Error updating container image version

Open angelokurtis opened this issue 1 year ago • 2 comments

I'm facing problems when updating the collector version through the CRD. Apparently the label app.kubernetes.io/version is being updated in the Deployment spec.template.metadata.labels while the Service spec.selector is not. It causes application telemetry not to reach the collector's pod after the update rollout is complete.

Steps to reproduce

  1. Install the collector through the CRD OpenTelemetryCollector:
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: simplest
spec:
  image: otel/opentelemetry-collector:0.59.0
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    exporters:
      otlp/jaeger:
        endpoint: jaeger-collector.jaeger.svc.cluster.local:4317
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [ otlp ]
          exporters: [ otlp/jaeger ]
EOF
  1. Update the collector's version from 0.59.0 to 0.60.0
kubectl patch otelcol simplest --type='json' -p='[{"op": "replace", "path": "/spec/image", "value":"otel/opentelemetry-collector:0.60.0"}]'
  1. Compare the Service selector with Pod labels
kubectl get svc simplest-collector -o jsonpath='{.spec.selector}'
kubectl get deploy simplest-collector -o jsonpath='{.spec.template.metadata.labels}'

or check your application logs to see something like that:

[otel.javaagent 2022-09-17 18:20:41:732 +0000] [OkHttp http://simplest-collector.otel.svc.cluster.local:4317/...] ERROR io.opentelemetry.exporter.internal.grpc.OkHttpGrpcExporter - Failed to export spans. The request could not be executed. Full error message: Failed to connect to simplest-collector.otel.svc.cluster.local/10.96.60.59:4317

Operator version

otelcol version 0.60.0

Proposal

Maybe we could just use the immutable labels as service selectors (like Deployment does). Another solution would be to simply fix the Service selector by updating its value as well.

If it makes sense, I would be happy to contribute 🙂

angelokurtis avatar Sep 17 '22 19:09 angelokurtis

What is the main objectieve of this ticket?

It causes application telemetry not to reach the collector's pod after the update rollout is complete.

Is it this line? That the auto-instrumentation fails to send data to the collector after image change?

Note that the version of the operator should match the collector version. The image is exposed in the CR to allow running a custom distributions (e.g. contrib) of the same version.

pavolloffay avatar Sep 20 '22 08:09 pavolloffay

Hi @pavolloffay! Thanks for the support.

Is it this line? That the auto-instrumentation fails to send data to the collector after image change?

Yes, but not just auto-instrumentation. Other manually instrumented applications also fail.

The image is exposed in the CR to allow running a custom distributions (e.g. contrib) of the same version.

In that case, should I just set my distro's repository in spec.image without a tag?

angelokurtis avatar Sep 21 '22 22:09 angelokurtis

I am facing the same issue.

If I create this OpenTelemetryCollector custom resource:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry
spec:
  config: |

    receivers:
      otlp:
        protocols:
          http:

    exporters:
      logging:

    service:
      pipelines:
        traces:
          receivers: [ otlp ]
          exporters: [ logging ]

This is the current Service selector labels:

$ kubectl get services opentelemetry-collector --output json | jq .spec.selector
{
  "app.kubernetes.io/component": "opentelemetry-collector",
  "app.kubernetes.io/instance": "wandersonwhcr.opentelemetry",
  "app.kubernetes.io/managed-by": "opentelemetry-operator",
  "app.kubernetes.io/name": "opentelemetry-collector",
  "app.kubernetes.io/part-of": "opentelemetry",
  "app.kubernetes.io/version": "latest"
}

... and Pod labels:

$ kubectl get pods --output json | jq .items[].metadata.labels
{
  "app.kubernetes.io/component": "opentelemetry-collector",
  "app.kubernetes.io/instance": "wandersonwhcr.opentelemetry",
  "app.kubernetes.io/managed-by": "opentelemetry-operator",
  "app.kubernetes.io/name": "opentelemetry-collector",
  "app.kubernetes.io/part-of": "opentelemetry",
  "app.kubernetes.io/version": "latest",
  "pod-template-hash": "65fb756985"
}

If I retrieve the current container image:

$ kubectl get pods --output json | jq .items[].spec.containers[].image
"ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector:0.60.0"

... and use it in my custom resource:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry
spec:
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector:0.60.0
  config: |

    receivers:
      otlp:
        protocols:
          http:

    exporters:
      logging:

    service:
      pipelines:
        traces:
          receivers: [ otlp ]
          exporters: [ logging ]

After applied, Service selector continues with label app.kubernetes.io/version=latest...

$ kubectl get services opentelemetry-collector --output json | jq .spec.selector
{
  "app.kubernetes.io/component": "opentelemetry-collector",
  "app.kubernetes.io/instance": "wandersonwhcr.opentelemetry",
  "app.kubernetes.io/managed-by": "opentelemetry-operator",
  "app.kubernetes.io/name": "opentelemetry-collector",
  "app.kubernetes.io/part-of": "opentelemetry",
  "app.kubernetes.io/version": "latest"
}

... but Pod changes its label to app.kubernetes.io/version=0.60.0.

$ kubectl get pods --output json | jq .items[].metadata.labels
{
  "app.kubernetes.io/component": "opentelemetry-collector",
  "app.kubernetes.io/instance": "wandersonwhcr.opentelemetry",
  "app.kubernetes.io/managed-by": "opentelemetry-operator",
  "app.kubernetes.io/name": "opentelemetry-collector",
  "app.kubernetes.io/part-of": "opentelemetry",
  "app.kubernetes.io/version": "0.60.0",
  "pod-template-hash": "6ddffd76d6"
}

This issue only happens when I change custom resource adding image attribute. If I create it with this image defined everything works.

TY @angelokurtis, I was trying to figure out why collector was not receiving spans after changing to contrib image... I didn't realize it was a selector problem... Again, TY

wandersonwhcr avatar Oct 01 '22 01:10 wandersonwhcr

Maybe we could just use the immutable labels as service selectors

This would be my preference, there is no need to add version label to the selector.

pavolloffay avatar Oct 03 '22 10:10 pavolloffay

Is anybody volunteering to fix this?

pavolloffay avatar Oct 03 '22 12:10 pavolloffay

I can do this and open a PR soon 👍🏽

angelokurtis avatar Oct 03 '22 14:10 angelokurtis