opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Delete otelcol CR blocked by error

Open fyuan1316 opened this issue 1 year ago • 3 comments

Component(s)

collector

What happened?

Description

When deleting an OpenTelemetryCollector CR, the operator reports an error that prevents the CR from being deleted correctly.

version: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:0.102.0

Steps to Reproduce

kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: bk
  namespace: opentelemetry-operator-system
spec:
  mode: statefulset # This configuration is omittable.
  replicas: 2
  config: |
    extensions:
      health_check: {}
      memory_ballast:
        size_in_percentage: 40
    receivers:
      otlp/traces:
        protocols:
          grpc:
            endpoint: :14317

    processors:
      batch: {}
      memory_limiter:
        check_interval: 5s
        limit_percentage: 85
        spike_limit_percentage: 25

    exporters:
      prometheus:
        endpoint: :8889
      otlp:
        endpoint: dns:///jaeger-prod-collector-headless.istio-system:4317  # to jaeger
        balancer_name: round_robin
      logging:
        loglevel: info

    connectors:
      asmservicegraph:
        extra_dimensions:
          mesh_id: dev
          cluster_name: business1
        store: 
          ttl: 5s
          max_items: 500 

    service:
      extensions:
        - health_check
        - memory_ballast
      telemetry:
        logs:
          level: info
        metrics:
          level: detailed
          address: :8888
      pipelines:
        traces:
          receivers: [zipkin, otlp/traces]
          processors: [memory_limiter, batch]
          exporters: [logging, otlp, asmservicegraph]
        metrics/graph:
          receivers: [asmservicegraph]
          exporters: [logging, prometheus]


EOF

Expected Result

Actual Result

Kubernetes Version

v1.28.8

Operator version

0.102.0

Collector version

0.102.1

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

Log output

{"level":"ERROR","timestamp":"2024-06-26T03:36:21Z","message":"Reconciler error","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","OpenTelemetryCollector":{"name":"bk","namespace":"opentelemetry-operator-system"},"namespace":"opentelemetry-operator-system","name":"bk","reconcileID":"b8cd40dd-52a9-47d6-8a36-d665047721ac","error":"OpenTelemetryCollector.opentelemetry.io \"bk\" is invalid: spec.config.service.pipelines.metrics/graph.processors: Invalid value: \"null\": spec.config.service.pipelines.metrics/graph.processors in body must be of type array: \"null\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

Additional context

No response

fyuan1316 avatar Jun 26 '24 03:06 fyuan1316

From the error message ... spec.config.service.pipelines.metrics/graph.processors: Invalid value: \"null\", ... It appears that the operator reconcile error was caused by not setting processors in the configuration.

However, looking at the usage of the OTel Collector and the official documentation, it seems that the way the operator parameter is used here is not entirely consistent with the upstream recommendations.

image

configuration example: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#usage

I would like to know if there are any special reasons or considerations for the current code design?

fyuan1316 avatar Jun 26 '24 05:06 fyuan1316

I've submitted a PR in an attempt to address this issue. I'd be keen to hear your thoughts on it.

fyuan1316 avatar Jun 26 '24 06:06 fyuan1316

I recall this having some kind of reasoning behind it. We should fix the bug causing the issue here, but I'm not sure if we should do so by making the attribute optional. @pavolloffay do you recall why we did this?

swiatekm avatar Jun 26 '24 10:06 swiatekm