opentelemetry-operator
opentelemetry-operator copied to clipboard
OpenTelemetryCollector ClusterIP and Headless Service are indistinguishable to a LabelSelector
Our use-case is running the Prometheus metrics exporter in the OpenTelemetry Operator-created OpenTelemetry Collector Deployment.
To activate this, we expected to create an OpenTelemetryCollector
object, which comes with a Service
, and then we create a ServiceMonitor
which uses a LabelSelector
to match that Service
; then Prometheus Operator extracts the Endpoints
from the selected Service, and configures Prometheus to scrape them.
However, OpenTelemetry Operator is creating two Services (ClusterIP and headless) and there are no distinctions in the labels, only the name and one extra annotation. This causes Prometheus to see two scrape targets for each Pod in the Deployment, distinguished only by the resulting job
and service
labels (and hence distinct metrics with duplicate data).
The simplest solution I see is if the headless service gets an extra specific label, e.g., operator.opentelemetry.io/collector-headless-service
then that could be matched using Exists
or DoesNotExist
to distinguish the two services.
Edit: We were exploring using PodMonitor
to work around this, but a colleague pointed out to me that the Pods created by OpenTelemetry Operator do not actually declare their ports, and the PodMonitor
requires a named port to scrape. So that's a no-go.
I've got the same problem :
2022-08-29T15:52:28.908Z error [email protected]/collector.go:237 failed to convert metric otelcol_otelsvc_k8s_pod_added: duplicate label names {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/collector.go:237
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
github.com/prometheus/[email protected]/prometheus/registry.go:448
2022-08-29T15:52:28.908Z error [email protected]/collector.go:237 failed to convert metric otelcol_receiver_accepted_metric_points: duplicate label names {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/collector.go:237
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
github.com/prometheus/[email protected]/prometheus/registry.go:448
Resources created :
❯ kubectl -n opentelemetry get opentelemetrycollector
NAME MODE VERSION AGE
opentelemetry-operator-collector statefulset 0.56.0 62m
❯ kubectl -n opentelemetry get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
opentelemetry-operator-collector-collector ClusterIP 10.244.28.11 <none> 9090/TCP,4317/TCP,4318/TCP 42m
opentelemetry-operator-collector-collector-headless ClusterIP None <none> 9090/TCP,4317/TCP,4318/TCP 42m
opentelemetry-operator-collector-collector-monitoring ClusterIP 10.244.30.229 <none> 8888/TCP 42m
opentelemetry-operator-controller-manager-metrics-service ClusterIP 10.244.23.112 <none> 8443/TCP,8080/TCP 45m
opentelemetry-operator-webhook-service ClusterIP 10.244.24.226 <none> 443/TCP 45m
❯ kubectl -n opentelemetry get servicemonitors
NAME AGE
opentelemetry-operator 34m
opentelemetry-operator-collector 45m
I plan to work on this. I'm not sure if the headless service is always needed? Regardless, I think we still need to add a label.
I think the headless service is there because there's (I read somewhere) an advantage to using gRPC's load-balancing over k8s' load balancing when using the the gRPC exporter to feed the collector. See #595 for discussion/proposed documentation of this, but note that it didn't actually achieve consensus.
For the record, the workaround we used was to change the Prometheus exporter to use port 8888, (and moved the internal Prometheus feed to 8889 which doesn't need to be exposed because we scrape that in the same collector to include it in our general metrics feed), and then pointed the ServiceMonitor at the -collector-monitoring
Service which already exposes port 8888.
+1 to this. A simple fix would be just changing the app.kubernetes.io/name
label.
For ClusterIp object, the name is in the format {}-collector
and for headless service, the name is {}-headless
. However, the app.kubernetes.io/name
label is the same for both object.