opentelemetry-helm-charts icon indicating copy to clipboard operation
opentelemetry-helm-charts copied to clipboard

[operator] Collector fails with featureGate errors when Upgrading the Operator to chart version 0.68.1

Open jlcrow opened this issue 5 months ago • 25 comments

Performed a routine helm upgrade from chart version 0.65.1 to 0.68.1 after the upgrade created Open Telemetry collector will not start. No errors in the operator - the collector errors and Crashloops

otel-prometheus-collector-0                        0/1     CrashLoopBackOff   7 (4m20s ago)   15m
 
Error: invalid argument "-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost" for "--feature-gates" flag: feature gate "confmap.unifyEnvVarExpansion" is stable, can not be disabled
2024/08/28 19:23:44 collector server run finished with error: invalid argument "-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost" for "--feature-gates" flag: feature gate "confmap.unifyEnvVarExpansion" is stable, can not be disabled

Collector config

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-prometheus
  namespace: monitoring
spec:
  mode: statefulset
  podAnnotations:
     sidecar.istio.io/inject: "false"
  targetAllocator:
    serviceAccount: opentelemetry-targetallocator-sa
    enabled: true
    prometheusCR:
      enabled: true
    observability:
      metrics:
        enableMetrics: true
    resources:
      requests:
        memory: 300Mi
        cpu: 300m
      limits:
        memory: 512Mi
        cpu: 500m
  priorityClassName: highest-priority
  resources:
    requests:
      memory: 600Mi
      cpu: 300m
    limits:
      memory: 1Gi
      cpu: 500m
  env:
    - name: K8S_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
  config: |
    processors:
      batch: {}
      memory_limiter:
        check_interval: 5s
        limit_percentage: 90    
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      memory_ballast: {}
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 10s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]         
            metric_relabel_configs:
            - action: labeldrop
              regex: (id|name)
            - action: labelmap
              regex: label_(.+)
          - job_name: kubernetes-nodes-cadvisor
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            honor_timestamps: true
            kubernetes_sd_configs:
            - role: node
            relabel_configs:
            - source_labels: [__meta_kubernetes_pod_node_name]
              action: replace
              target_label: node
              regex: (.*)
              replacement: $$1         
            - action: labelmap
              regex: __meta_kubernetes_node_label_(.+)
            - replacement: kubernetes.default.svc:443
              target_label: __address__
            - regex: (.+)
              replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
              source_labels:
              - __meta_kubernetes_node_name
              target_label: __metrics_path__
            scheme: https
            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
          - job_name: kube-state-metrics
            kubernetes_sd_configs:
            - role: endpoints
              selectors:
              - role: endpoints
                label: "app.kubernetes.io/name=kube-state-metrics" 
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_service_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_service_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_service_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: exporter_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_service_name
              target_label: service_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: node
            metric_relabel_configs:
            - source_labels: [__name__]
              regex: kube_pod_status_(reason|scheduled|ready)
              action: drop
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    exporters:
      prometheusremotewrite:
        endpoint: https://<internal miimir endpoint>
        retry_on_failure:
          enabled: true
          initial_interval: 1s
          max_interval: 10s
          max_elapsed_time: 30s
    service:
      telemetry:
          metrics:
            address: "0.0.0.0:8888"
            level: basic
          logs:
            level: "warn"  
      extensions:
      - health_check
      - memory_ballast
      pipelines:
        metrics:
          receivers:
          - prometheus
          - otlp
          processors:
          - memory_limiter
          - batch
          exporters:
          - prometheusremotewrite

jlcrow avatar Aug 28 '24 19:08 jlcrow