Delete otelcol CR blocked by error
Component(s)
collector
What happened?
Description
When deleting an OpenTelemetryCollector CR, the operator reports an error that prevents the CR from being deleted correctly.
version: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:0.102.0
Steps to Reproduce
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: bk
namespace: opentelemetry-operator-system
spec:
mode: statefulset # This configuration is omittable.
replicas: 2
config: |
extensions:
health_check: {}
memory_ballast:
size_in_percentage: 40
receivers:
otlp/traces:
protocols:
grpc:
endpoint: :14317
processors:
batch: {}
memory_limiter:
check_interval: 5s
limit_percentage: 85
spike_limit_percentage: 25
exporters:
prometheus:
endpoint: :8889
otlp:
endpoint: dns:///jaeger-prod-collector-headless.istio-system:4317 # to jaeger
balancer_name: round_robin
logging:
loglevel: info
connectors:
asmservicegraph:
extra_dimensions:
mesh_id: dev
cluster_name: business1
store:
ttl: 5s
max_items: 500
service:
extensions:
- health_check
- memory_ballast
telemetry:
logs:
level: info
metrics:
level: detailed
address: :8888
pipelines:
traces:
receivers: [zipkin, otlp/traces]
processors: [memory_limiter, batch]
exporters: [logging, otlp, asmservicegraph]
metrics/graph:
receivers: [asmservicegraph]
exporters: [logging, prometheus]
EOF
Expected Result
Actual Result
Kubernetes Version
v1.28.8
Operator version
0.102.0
Collector version
0.102.1
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")
Log output
{"level":"ERROR","timestamp":"2024-06-26T03:36:21Z","message":"Reconciler error","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","OpenTelemetryCollector":{"name":"bk","namespace":"opentelemetry-operator-system"},"namespace":"opentelemetry-operator-system","name":"bk","reconcileID":"b8cd40dd-52a9-47d6-8a36-d665047721ac","error":"OpenTelemetryCollector.opentelemetry.io \"bk\" is invalid: spec.config.service.pipelines.metrics/graph.processors: Invalid value: \"null\": spec.config.service.pipelines.metrics/graph.processors in body must be of type array: \"null\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
Additional context
No response
From the error message ... spec.config.service.pipelines.metrics/graph.processors: Invalid value: \"null\", ... It appears that the operator reconcile error was caused by not setting processors in the configuration.
However, looking at the usage of the OTel Collector and the official documentation, it seems that the way the operator parameter is used here is not entirely consistent with the upstream recommendations.
configuration example: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#usage
I would like to know if there are any special reasons or considerations for the current code design?
I've submitted a PR in an attempt to address this issue. I'd be keen to hear your thoughts on it.
I recall this having some kind of reasoning behind it. We should fix the bug causing the issue here, but I'm not sure if we should do so by making the attribute optional. @pavolloffay do you recall why we did this?