opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

High Latency When Using Nginx Ingress with OTel Collector over HTTP

Open oszlak opened this issue 8 months ago • 2 comments

Describe your environment

Environment

Kubernetes: v1.30 Local setup using kind OTel Collector: v0.121.0 sdk: 1.31.0 Nginx Ingress Controller

Description When using the OpenTelemetry collector directly with port forwarding (4318), the metric export latency is normal (50-150ms). However, when introducing an Nginx ingress in front of the collector, the latency increases dramatically to 3-5 seconds per export. Steps to Reproduce

Simple OTel collector pipeline configuration is used OTLP HTTP exporter is being used (port 4318) No visible errors in logs, just increased latency Using ingress rewrite rule: nginx.ingress.kubernetes.io/rewrite-target: /v1/metrics Collector configured as statefulset with minimal processing (batch processor only)

Troubleshooting Attempted

Verified that the ingress configuration is correct by confirming metrics are received Checked Nginx ingress controller logs for any errors or warnings Confirmed that other services behind the same ingress controller don't experience similar latency issues Using a minimal collector configuration with only debug exporter Ingress is using nginx.ingress.kubernetes.io/rewrite-target annotation that might affect routing

Impact This latency increase makes using Nginx ingress in front of OTel collector impractical for production environments where timely metric export is critical.

# Simple OpenTelemetry Collector configuration
# Just receives metrics on port 4318 and outputs to stdout

global:
  defaultApplicationName: "metrics-local-kind"
  defaultSubsystemName: "metrics-local-kind"
nameOverride: "metrics-local-kind"
fullnameOverride: "metrics-local-kind"
mode: "statefulset"  # Keeping statefulset as in original config

# Disable all presets that we don't need
presets:
  logsCollection:
    enabled: false
  hostMetrics:
    enabled: false
  kubernetesAttributes:
    enabled: false  # Changed to false since we're just printing to stdout
  clusterMetrics:
    enabled: false
  kubeletMetrics:
    enabled: false

configMap:
  create: true

# The core configuration
config:
  exporters:
    # Only using debug exporter to print to stdout
    debug:
      verbosity: detailed  # Print detailed metrics information

  extensions:
    health_check: {}  # Keep health check for monitoring

  processors:
    batch:  # Basic batch processor to efficiently handle metrics
      send_batch_size: 1024
      timeout: "1s"

  receivers:
    otlp:  # OTLP receiver to get metrics
      protocols:
        http:
          endpoint: "0.0.0.0:4318"  # Listen for HTTP OTLP metrics on port 4318

  service:
    extensions:
      - health_check
    pipelines:
      metrics:  # Simple metrics pipeline
        receivers:
          - otlp
        processors:
          - batch
        exporters:
          - debug  # Only export to debug (stdout)

# Container image configuration
image:
  repository: otel/opentelemetry-collector-contrib
  pullPolicy: IfNotPresent
  tag: "0.121.0"  # Keeping your version

command:
  name: otelcol-contrib

# Basic setup for the service account
serviceAccount:
  create: true

# We don't need cluster role
clusterRole:
  create: false

# Restoring statefulset configuration from original
statefulset:
  persistentVolumeClaimRetentionPolicy:
    enabled: true
    whenDeleted: Delete
    whenScaled: Retain
  volumeClaimTemplates:
    - metadata:
        name: queue
      spec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "1Gi"

# Add pod identity as an environment variable for application use
extraVolumeMounts:
  - name: queue
    mountPath: /var/lib/storage/queue

initContainers:
  - name: init-fs
    image: busybox:latest
    command:
      - sh
      - "-c"
      - "chown -R 10001: /var/lib/storage/queue"
    volumeMounts:
      - name: queue
        mountPath: /var/lib/storage/queue

# Enable required ports
ports:
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8888
    servicePort: 8888
    protocol: TCP

# Minimal resource requirements
resources:
  limits:
    memory: 200Mi
  requests:
    cpu: 200m
    memory: 200Mi

replicaCount: 1

# Simple ClusterIP service
service:
  type: ClusterIP

# Keeping the ingress configuration
ingress:
  enabled: true
  ingressClassName: nginx  # Matches the NGINX Ingress Controller
  hosts:
    - host: otel-metrics.local  # Dummy host for local testing in Kind
      paths:
        - path: /
          pathType: Prefix
          port: 4318
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /v1/metrics  # Rewrite to OTLP metrics endpoint

python app

import logging
logger = logging.getLogger("via_telemetry")

from typing import Optional
from opentelemetry.sdk.resources import SERVICE_NAME, Attributes, Resource
from opentelemetry.exporter.otlp.proto.http import Compression
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.metrics import set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics._internal.export import MetricExporter
from opentelemetry.sdk.metrics.export import (
    ConsoleMetricExporter,
    PeriodicExportingMetricReader,
)
from opentelemetry.sdk.resources import Resource
from opentelemetry import metrics

exporter = OTLPMetricExporter(
    endpoint='http://0.0.0.0:4318/v1/metrics',
    timeout=8,  # type: ignore[arg-type]
    compression=Compression("none"),
)
metric_readers = []
reader = PeriodicExportingMetricReader(exporter)
metric_readers.append(reader)
service_attr: Attributes = {SERVICE_NAME: "sdk_test", "team": "o11y"}
service_resource = Resource(attributes=service_attr)
meter_provider = MeterProvider(resource=service_resource, metric_readers=metric_readers)
set_meter_provider(meter_provider)
meter = metrics.get_meter("otel-tests")

process_counter = meter.create_counter(
    name="sdk_counter_tests",
    unit="invocation",
    description="Counts the number of process invocations with large increase",
)

def main():
    counter = 0
    logger.info("Function triggered successfully")
    labels = {
        'env': 'dev',
        'city_id': '123',
    }
    try:
        while True:
            rand_num = random.randrange(1, 10)
            logger.info("Function triggered successfully")
            process_counter.add(rand_num, labels)
            counter += rand_num
            time.sleep(1)
    except KeyboardInterrupt:
        start = time.time()
        meter_provider.force_flush(8000)
        end = time.time()
        print(f"time to flush metrics: {end - start}")
        print(f'total counter: {counter}')
if __name__ == "__main__":
    main()

What happened?

With direct port forwarding: 50-150ms latency With Nginx ingress: 3-5s latency (30-100x increase)

Steps to Reproduce

Set up a kind cluster with Kubernetes 1.30 Deploy OTel collector v0.121.0 with a simple pipeline Test direct export via port forwarding: Copykubectl port-forward service/otel-collector 4318:4318 Result: Export latency is 50-150ms Deploy Nginx ingress controller and configure it to route to the OTel collector Export metrics through the ingress Result: Export latency increases to 3-5 seconds

Expected Result

The latency should remain comparable when using an ingress, perhaps with a slight increase but not a 30-100x degradation. Additional Information

Actual Result

When using the OpenTelemetry collector directly with port forwarding (4318), the metric export latency is normal (50-150ms). However, when introducing an Nginx ingress in front of the collector, the latency increases dramatically to 3-5 seconds per export.

Additional context

No response

Would you like to implement a fix?

None

oszlak avatar Mar 17 '25 16:03 oszlak

Pretty sure this is not the correct repository to report an issue with an ingress controller.

xrmx avatar Mar 17 '25 16:03 xrmx

thank you @xrmx actually i'm not sure it belongs to nginx also, this delay only occurs when behind the ingress there is an OTel collector Maybe someone has any clue?

oszlak avatar Mar 17 '25 17:03 oszlak