opentelemetry-operator icon indicating copy to clipboard operation
opentelemetry-operator copied to clipboard

Failed to scrape Prometheus endpoint - 401 Unauthorized

Open alita1991 opened this issue 7 months ago • 5 comments

Component(s)

No response

What happened?

Description

I'm encountering an Unauthorized error when collecting MongoDB metrics from the secured endpoint. If I try via curl, everything works fine.

ServiceMonitor

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mongodb-sm
  namespace: mongodb
spec:
  endpoints:
  - basicAuth:
      password:
        key: password
        name: mongodb-prometheus-auth
      username:
        key: username
        name: mongodb-prometheus-auth
    port: prometheus
    scheme: http
  namespaceSelector:
    matchNames:
    - mongodb
  selector:
    matchLabels:
      app: mongodb-replicaset-one-member-svc

Collector

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"
  name: otel-collector
  namespace: observability
spec:
  config:
    exporters:
      debug:
        verbosity: basic
      prometheusremotewrite:
        endpoint: http://mimir-nginx:80/api/v1/push
        resource_to_telemetry_conversion:
          enabled: true
    processors:
      batch:
        send_batch_size: 8192
      memory_limiter:
        check_interval: 1s
        limit_percentage: 50
        spike_limit_percentage: 10
      resource/remove_container_id:
        attributes:
        - action: delete
          key: container.id
      resourcedetection/env:
        detectors:
        - env
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            max_recv_msg_size_mib: 20
      prometheus:
        config:
          global:
            scrape_interval: 30s
          scrape_configs:
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            job_name: probes
            kubernetes_sd_configs:
            - follow_redirects: true
              kubeconfig_file: ""
              role: node
            relabel_configs:
            - replacement: kubernetes.default.svc.cluster.local:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$${1}/proxy/metrics/probes
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
              server_name: kubernetes
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            job_name: cadvisor
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - replacement: kubernetes.default.svc.cluster.local:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
              server_name: kubernetes
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            job_name: kubelet
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - replacement: kubernetes.default.svc.cluster.local:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$${1}/proxy/metrics
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
              server_name: kubernetes
          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: (.+?)(?::\d+)?;(\d+)
              replacement: $${1}:$${2}
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kube_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod
          - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            job_name: apiserver
            kubernetes_sd_configs:
            - role: endpoints
            relabel_configs:
            - action: keep
              regex: default;kubernetes;https
              source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
              server_name: kubernetes
    service:
      pipelines:
        metrics:
          exporters:
          - prometheusremotewrite
          - debug
          processors:
          - resourcedetection/env
          - resource/remove_container_id
          - memory_limiter
          - batch
          receivers:
          - otlp
          - prometheus
      telemetry:
        logs:
          level: debug
        metrics:
          address: 0.0.0.0:8888
          level: detailed
  configVersions: 3
  daemonSetUpdateStrategy: {}
  deploymentUpdateStrategy: {}
  imagePullPolicy: Always
  ingress:
    route: {}
  managementState: managed
  mode: statefulset
  observability:
    metrics: {}
  podSecurityContext:
    seccompProfile:
      type: RuntimeDefault
  ports:
  - name: metrics
    port: 4317
    protocol: TCP
  replicas: 2
  serviceAccount: collector-sa
  targetAllocator:
    allocationStrategy: consistent-hashing
    enabled: true
    filterStrategy: relabel-config
    observability:
      metrics: {}
    prometheusCR:
      enabled: true
      podMonitorSelector: {}
      scrapeInterval: 30s
      serviceMonitorSelector: {}
    serviceAccount: collector-sa
  upgradeStrategy: automatic

Scrape config

"serviceMonitor/mongodb/mongodb-sm/0": {
    "basic_auth": {
      "password": "<secret>",
      "username": "prometheus"
    },
    "enable_compression": true,
    "enable_http2": true,
    "follow_redirects": true,
    "honor_timestamps": true,
    "job_name": "serviceMonitor/mongodb/mongodb-sm/0",
    "kubernetes_sd_configs": [
      {
        "enable_http2": true,
        "follow_redirects": true,
        "kubeconfig_file": "",
        "namespaces": {
          "names": [
            "mongodb"
          ],
          "own_namespace": false
        },
        "role": "endpointslice"
      }
    ],
    "metrics_path": "/metrics",
    ....
}

Steps to Reproduce

Scrape a protected metrics endpoint, which is defined via a ServiceMonitor.

Expected Result

Scrape for MongoDB should be successful, like I was able via curl with port-forwarding:

kubectl port-forward svc/mongodb-replicaset-one-member-svc 9216:9216 -n MongoDB
curl localhost:9216/metrics -u prometheus:password
# HELP hardware_disk_metrics_disk_space_free_bytes
# TYPE hardware_disk_metrics_disk_space_free_bytes gauge
hardware_disk_metrics_disk_space_free_bytes{disk_name="nvme0n1p1"} 3.811151872e+10
# HELP hardware_disk_metrics_disk_space_used_bytes The disk space used in the mounted file system.
# TYPE hardware_disk_metrics_disk_space_used_bytes gauge
hardware_disk_metrics_disk_space_used_bytes{disk_name="nvme0n1p1"} 1.3723582464e+10
# HELP hardware_disk_metrics_read_count The number of read I/Os processed.
# TYPE hardware_disk_metrics_read_count counter

Actual Result

2025-05-20T20:35:52.126Z    info  Metrics {"resource metrics": 129, "metrics": 920, "data points": 958}
2025-05-20T20:35:52.448Z    debug  Scrape failed  {"scrape_pool": "serviceMonitor/mongodb/mongodb-sm/0", "target": "http://10.0.4.211:9216/metrics", "err": "server returned HTTP status 401 Unauthorized"}
2025-05-20T20:35:52.448Z    warn  internal/transaction.go:137   Failed to scrape Prometheus endpoint  {"scrape_timestamp": 1747773352446, "target_labels": "{_name_=\"up\", endpoint=\"prometheus\", instance=\"10.0.4.211:9216\", job=\"mongodb-replicaset-one-member-svc\", namespace=\"mongodb\", pod=\"mongodb-replicaset-one-member-0\", service=\"mongodb-replicaset-one-member-svc\"}"}

Kubernetes Version

v1.29.9+k3s1

Operator version

0.124.0

Collector version

0.124.1

Environment information

No response

Log output


Additional context

No response

alita1991 avatar May 21 '25 09:05 alita1991

Are you using a service account with the proper permission?

iblancasa avatar May 21 '25 10:05 iblancasa

Are you using a service account with the proper permission?

Yes, I have a service account with the proper permissions

alita1991 avatar May 21 '25 13:05 alita1991

Can you try enabling the operator.targetallocator.mtls feature gate for the operator? Authenticated endpoints are not supported without that flag. See https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator#service--pod-monitor-endpoint-credentials for more information.

swiatekm avatar May 21 '25 15:05 swiatekm

I enabled operator.targetallocator.mtls feature gate, I still have issues. The operator is running with the webhooks deactivated and without a cert manager.

alita1991 avatar May 27 '25 07:05 alita1991

Unfortunately, this only works with cert-manager enabled at the moment. Otherwise, the target allocator does not support authenticated Prometheus endpoints.

swiatekm avatar May 27 '25 07:05 swiatekm