opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

OpenTelemetryCollector ClusterIP and Headless Service are indistinguishable to a LabelSelector

Open TBBle opened this issue 2 years ago • 3 comments

Our use-case is running the Prometheus metrics exporter in the OpenTelemetry Operator-created OpenTelemetry Collector Deployment.

To activate this, we expected to create an OpenTelemetryCollector object, which comes with a Service, and then we create a ServiceMonitor which uses a LabelSelector to match that Service; then Prometheus Operator extracts the Endpoints from the selected Service, and configures Prometheus to scrape them.

However, OpenTelemetry Operator is creating two Services (ClusterIP and headless) and there are no distinctions in the labels, only the name and one extra annotation. This causes Prometheus to see two scrape targets for each Pod in the Deployment, distinguished only by the resulting job and service labels (and hence distinct metrics with duplicate data).

The simplest solution I see is if the headless service gets an extra specific label, e.g., operator.opentelemetry.io/collector-headless-service then that could be matched using Exists or DoesNotExist to distinguish the two services.

Edit: We were exploring using PodMonitor to work around this, but a colleague pointed out to me that the Pods created by OpenTelemetry Operator do not actually declare their ports, and the PodMonitor requires a named port to scrape. So that's a no-go.

TBBle avatar May 28 '22 19:05 TBBle

I've got the same problem :

2022-08-29T15:52:28.908Z        error   [email protected]/collector.go:237     failed to convert metric otelcol_otelsvc_k8s_pod_added: duplicate label names   {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/collector.go:237
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/[email protected]/prometheus/registry.go:448
2022-08-29T15:52:28.908Z        error   [email protected]/collector.go:237     failed to convert metric otelcol_receiver_accepted_metric_points: duplicate label names {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).Collect
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/collector.go:237
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1
        github.com/prometheus/[email protected]/prometheus/registry.go:448

Resources created :

❯ kubectl -n opentelemetry get opentelemetrycollector
NAME                               MODE          VERSION   AGE
opentelemetry-operator-collector   statefulset   0.56.0    62m
❯ kubectl -n opentelemetry get svc
NAME                                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
opentelemetry-operator-collector-collector                  ClusterIP   10.244.28.11    <none>        9090/TCP,4317/TCP,4318/TCP   42m
opentelemetry-operator-collector-collector-headless         ClusterIP   None            <none>        9090/TCP,4317/TCP,4318/TCP   42m
opentelemetry-operator-collector-collector-monitoring       ClusterIP   10.244.30.229   <none>        8888/TCP                     42m
opentelemetry-operator-controller-manager-metrics-service   ClusterIP   10.244.23.112   <none>        8443/TCP,8080/TCP            45m
opentelemetry-operator-webhook-service                      ClusterIP   10.244.24.226   <none>        443/TCP                      45m
❯ kubectl -n opentelemetry get servicemonitors
NAME                               AGE
opentelemetry-operator             34m
opentelemetry-operator-collector   45m

nlamirault avatar Aug 29 '22 16:08 nlamirault

I plan to work on this. I'm not sure if the headless service is always needed? Regardless, I think we still need to add a label.

kristinapathak avatar Sep 01 '22 22:09 kristinapathak

I think the headless service is there because there's (I read somewhere) an advantage to using gRPC's load-balancing over k8s' load balancing when using the the gRPC exporter to feed the collector. See #595 for discussion/proposed documentation of this, but note that it didn't actually achieve consensus.

For the record, the workaround we used was to change the Prometheus exporter to use port 8888, (and moved the internal Prometheus feed to 8889 which doesn't need to be exposed because we scrape that in the same collector to include it in our general metrics feed), and then pointed the ServiceMonitor at the -collector-monitoring Service which already exposes port 8888.

TBBle avatar Sep 02 '22 04:09 TBBle

+1 to this. A simple fix would be just changing the app.kubernetes.io/name label. For ClusterIp object, the name is in the format {}-collector and for headless service, the name is {}-headless. However, the app.kubernetes.io/name label is the same for both object.

mingh2 avatar Nov 10 '22 01:11 mingh2