openobserve-helm-chart icon indicating copy to clipboard operation
openobserve-helm-chart copied to clipboard

Understanding the OpenTelemetry Collector configuration of openobserve-collector

Open jennydaman opened this issue 1 year ago • 2 comments

tl;dr the values.yaml of openobserve-collector is over-complicated. A simpler solution can be achieved using the upstream OpenTelemetry collector's chart.

I am reviewing the code of the openobserve-collector and would like to ask some questions about how it works.

Currently I'm running a Kubernetes cluster with OpenObserve deployed in the monitoring namespace. Instead of using the openobserve-collector chart, I am using the upstream OpenTelemetry collector's chart with presets enabled. The setup can be achieved with a relatively concise helmfile:

repositories:
  - name: open-telemetry
    url: https://open-telemetry.github.io/opentelemetry-helm-charts

releases:
  - name: collector-agent
    namespace: monitoring
    chart: open-telemetry/opentelemetry-collector
    version: 0.111.2
    values:
      - image:
          repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-k8s
        mode: daemonset
        presets:
          logsCollection:
            enabled: true
          hostMetrics:
            enabled: true
          kubernetesAttributes:
            enabled: true
            extractAllPodLabels: true
            extractAllPodAnnotations: false
          kubeletMetrics:
            enabled: true
        config: &CONFIG
          receivers:
            kubeletstats:
              insecure_skip_verify: true
          exporters:
            otlp/openobserve:
              endpoint: http://openobserve.monitoring.svc:5081
              headers:
                Authorization: {{
                  printf "%s:%s"
                    (fetchSecretValue "ref+k8s://v1/Secret/monitoring/openobserve-root-user/ZO_ROOT_USER_EMAIL")
                    (fetchSecretValue "ref+k8s://v1/Secret/monitoring/openobserve-root-user/ZO_ROOT_USER_PASSWORD")
                  | b64enc | print "Basic " | quote
                }}
                organization: default
                stream-name: default
              tls:
                insecure: true
          service:
            pipelines:
              logs:
                exporters:
                  - otlp/openobserve
              metrics:
                exporters:
                  - otlp/openobserve
              traces:
                exporters:
                  - otlp/openobserve
        resources: {} # -- snip --

  - name: collector-cluster
    namespace: monitoring
    chart: open-telemetry/opentelemetry-collector
    version: 0.111.2
    values:
      - image:
          repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-k8s
        mode: deployment
        replicaCount: 1
        presets:
          clusterMetrics:
            enabled: true
          kubernetesEvents:
            enabled: true
        config: *CONFIG
        resources: {} # -- snip --

The helmfile.yaml defines two releases. The one called collector-agent handles log ingestion. The generated collector config is obtained with the command:

kubectl get -n monitoring configmap collector-agent-opentelemetry-collector-agent -o jsonpath='{.data.relay}'
Upstream OpenTelemetry collector generated configuration
exporters:
  debug: {}
  otlp/openobserve:
    endpoint: http://openobserve.monitoring.svc:5081
    headers:
      Authorization: Basic ZGV2QGJhYnltcmkub3JnOmNocmlzMTIzNA==
      organization: default
      stream-name: otel-chart
    tls:
      insecure: true
extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  batch: {}
  k8sattributes:
    extract:
      labels:
      - from: pod
        key_regex: (.*)
        tag_name: $$1
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
    filter:
      node_from_env_var: K8S_NODE_NAME
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25
receivers:
  filelog:
    exclude:
    - /var/log/pods/monitoring_collector-agent-opentelemetry-collector*_*/opentelemetry-collector/*.log
    include:
    - /var/log/pods/*/*/*.log
    include_file_name: false
    include_file_path: true
    operators:
    - id: container-parser
      max_log_size: 102400
      type: container
    retry_on_failure:
      enabled: true
    start_at: end
  hostmetrics:
    collection_interval: 10s
    root_path: /hostfs
    scrapers:
      cpu: null
      disk: null
      filesystem:
        exclude_fs_types:
          fs_types:
          - autofs
          - binfmt_misc
          - bpf
          - cgroup2
          - configfs
          - debugfs
          - devpts
          - devtmpfs
          - fusectl
          - hugetlbfs
          - iso9660
          - mqueue
          - nsfs
          - overlay
          - proc
          - procfs
          - pstore
          - rpc_pipefs
          - securityfs
          - selinuxfs
          - squashfs
          - sysfs
          - tracefs
          match_type: strict
        exclude_mount_points:
          match_type: regexp
          mount_points:
          - /dev/*
          - /proc/*
          - /sys/*
          - /run/k3s/containerd/*
          - /var/lib/docker/*
          - /var/lib/kubelet/*
          - /snap/*
      load: null
      memory: null
      network: null
  jaeger:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:14250
      thrift_compact:
        endpoint: ${env:MY_POD_IP}:6831
      thrift_http:
        endpoint: ${env:MY_POD_IP}:14268
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 20s
    endpoint: ${env:K8S_NODE_IP}:10250
    insecure_skip_verify: true
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        endpoint: ${env:MY_POD_IP}:4318
  prometheus:
    config:
      scrape_configs:
      - job_name: opentelemetry-collector
        scrape_interval: 10s
        static_configs:
        - targets:
          - ${env:MY_POD_IP}:8888
  zipkin:
    endpoint: ${env:MY_POD_IP}:9411
service:
  extensions:
  - health_check
  pipelines:
    logs:
      exporters:
      - otlp/openobserve
      processors:
      - k8sattributes
      - memory_limiter
      - batch
      receivers:
      - otlp
      - filelog
    metrics:
      exporters:
      - otlp/openobserve
      processors:
      - k8sattributes
      - memory_limiter
      - batch
      receivers:
      - otlp
      - prometheus
      - hostmetrics
      - kubeletstats
    traces:
      exporters:
      - otlp/openobserve
      processors:
      - k8sattributes
      - memory_limiter
      - batch
      receivers:
      - otlp
      - jaeger
      - zipkin
  telemetry:
    metrics:
      address: ${env:MY_POD_IP}:8888

Here is an example log entry from OpenObserve using the above upstream OpenTelemetry collector chart:

{
  "_timestamp": 1737473264323746,
  "app": "openobserve",
  "apps_kubernetes_io_pod_index": "0",
  "body": "2025-01-21T15:27:44.323513488+00:00 INFO actix_web::middleware::logger: 172.18.0.4 \"GET /api/default/otel_chart/_values?fields=k8s_container_name&size=10&start_time=1737472364215000&end_time=1737473264215000&sql=U0VMRUNUICogRlJPTSAib3RlbF9jaGFydCIg&type=logs HTTP/1.1\" 200 250 \"-\" \"http://localhost:32020/web/logs?stream_type=logs&stream=otel_chart&period=15m&refresh=0&sql_mode=false&query=YXBwX2t1YmVybmV0ZXNfaW9fbmFtZSA9ICdjaHJpcy13b3JrZXItbWFpbnMn&type=stream_explorer&defined_schemas=user_defined_schema&org_identifier=default&quick_mode=false&show_histogram=true\" \"Mozilla/5.0 (X11; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0\" 0.099962",
  "controller_revision_hash": "openobserve-69f6d688f6",
  "dropped_attributes_count": 0,
  "k8s_container_name": "openobserve",
  "k8s_container_restart_count": "1",
  "k8s_namespace_name": "monitoring",
  "k8s_node_name": "khris-worker",
  "k8s_pod_name": "openobserve-0",
  "k8s_pod_start_time": "2025-01-20T22:14:56Z",
  "k8s_pod_uid": "1c857c0a-066e-40ba-8676-6c874631f1ca",
  "k8s_statefulset_name": "openobserve",
  "log_file_path": "/var/log/pods/monitoring_openobserve-0_1c857c0a-066e-40ba-8676-6c874631f1ca/openobserve/1.log",
  "log_iostream": "stdout",
  "logtag": "F",
  "name": "openobserve",
  "severity": 0,
  "statefulset_kubernetes_io_pod_name": "openobserve-0"
}

Meanwhile, openobserve-collector's default values.yaml specifies complex routing and regular expression named capture groups to extract metadata from log file names:

https://github.com/openobserve/openobserve-helm-chart/blob/b146f802fbd5305c00eba85dc8fa8683680ae3dc/charts/openobserve-collector/values.yaml#L130-L170

Seeing that the upstream's config can produce logs with the metadata k8s_pod_name, k8s_namespace_name, etc. (via the k8sattributes processor) with a simpler config, why does openobserve-collector's values.yaml have these regexes?

jennydaman avatar Jan 21 '25 16:01 jennydaman

why does openobserve-collector's values.yaml have these regexes?

There is no reason why this could not be updated. The collector helm chart replaced this configuration once the filelogreceiver started making use of the new container-parser stanza operator with the collector 102 release (this chart defaults to 113).

ocraviotto avatar Mar 01 '25 17:03 ocraviotto

I've also been in and out of the helm chart recently as it's out of date with things; would appreciate some movement on this

m3ac-AllbrittenJ avatar Apr 02 '25 17:04 m3ac-AllbrittenJ