opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Persistent Queue breaks Headers Setter extension

Open lindeskar opened this issue 5 months ago • 9 comments

Component(s)

exporterhelper

What happened?

Description

Adding sending_queue.storage to Loki exporter combined with auth.authenticator: headers_setter removes the X-Scope-OrgID header from the connections to Loki.

Steps to Reproduce

  1. Run Loki with auth_enabled: true
  2. Export data using Loki exporter and use headers_setter to set X-Scope-OrgID (everything works as expected)
  3. Enable Persistent Queue via sending_queue.storage (connections fail with HTTP 401)

Using a static header instead of headers_setter makes things work together with sending_queue.storage.

Expected Result

Connections to Loki include the X-Scope-OrgID header, like before adding sending_queue.storage.

Actual Result

Connections to Loki do not have the X-Scope-OrgID header.

Loki and OTel Collector logs HTTP 401.

Collector version

0.93.0

Environment information

Environment

Docker-compose with:

  • otel/opentelemetry-collector-contrib:0.93.0
  • grafana/loki:2.9.4
  • telemetrygen @ v0.93.0

OpenTelemetry Collector configuration

---
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        include_metadata: true

exporters:
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
    auth:
      authenticator: headers_setter
    sending_queue:
      enabled: true
      # Enabling this breaks headers_setter, comment the row to see it work
      storage: file_storage/queue
    # And using a static header works
    # headers:
    #   X-Scope-OrgID: "foobar"

extensions:
  headers_setter:
    headers:
      - action: upsert
        from_context: loki_tenant
        key: X-Scope-OrgID

  file_storage/queue:
    compaction:
      directory: /var/lib/storage/queue
    directory: /var/lib/storage/queue

service:
  extensions:
    - headers_setter
    - file_storage/queue

  pipelines:
    logs:
      receivers:
        - otlp
      processors: []
      exporters:
        - loki

Log output

otelcol-1       | 2024-02-02T15:23:38.851Z      error   exporterhelper/common.go:95     Exporting failed. Dropping data.        {"kind": "exporter", "data_type": "logs", "name": "loki", "error": "not retryable error: Permanent error: HTTP 401 \"Unauthorized\": no org id", "dropped_items": 1}

Additional context

I added the full config used for testing here: https://github.com/lindeskar/otelcol-31018

docker-compose.yaml

---
version: "3"

volumes:
  queue:

services:
  loki:
    image: grafana/loki:2.9.4
    volumes:
      - ./loki.yaml:/etc/loki/loki.yaml
    command: -config.file=/etc/loki/loki.yaml
    ports:
      - "3100:3100"

  otelcol:
    image: otel/opentelemetry-collector-contrib:0.93.0
    volumes:
      - ./otelcol_config.yaml:/etc/otelcol-contrib/config.yaml
      - ./queue:/var/lib/storage/queue
    command:
      - "--config=/etc/otelcol-contrib/config.yaml"
    depends_on:
      - loki

  telemetrygen:
    build: https://github.com/open-telemetry/opentelemetry-collector-contrib.git#v0.93.0:cmd/telemetrygen
    command:
      - logs
      - --otlp-http
      - --otlp-endpoint=otelcol:4318
      - --otlp-header=loki_tenant="foobar"
      - --otlp-insecure
      - --otlp-attributes=foo="bar"
      - --duration=10s
    depends_on:
      - otelcol

loki.yaml

---
auth_enabled: true

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v12
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

Test query:

logcli --org-id foobar query '{exporter="OTLP"}'

lindeskar avatar Feb 02 '24 15:02 lindeskar

Pinging code owners:

  • exporter/loki: @gramidt @gouthamve @jpkrohling @mar4uk
  • extension/headerssetter: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Feb 02 '24 15:02 github-actions[bot]

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • exporter/loki: @gramidt @gouthamve @jpkrohling @mar4uk
  • extension/headerssetter: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Apr 03 '24 03:04 github-actions[bot]

Hi again,

It looks like this also happens with otlphttpexporter. I will prepare a demo config.

Please advise; do I open a new issue or rewrite this one?

lindeskar avatar May 06 '24 11:05 lindeskar

You can use this one.

jpkrohling avatar May 07 '24 12:05 jpkrohling

Pinging code owners for extension/storage: @dmitryax @atoulme @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar May 07 '24 12:05 github-actions[bot]

I moved this issue to the core repository based on the latest information. The persistent queue feature is part of the exporter helper.

jpkrohling avatar May 07 '24 12:05 jpkrohling

I haven't looked in depth yet, but my guess is that the context.Context that holds the request information is lost when the data is added to the persistent queue - i.e. the relevant request data isn't getting written to disk. This is essentially the same problem as using a component that drops the context, such as the tailsamplingprocessor.

I don't have any good suggestions for solutions yet. Is the incoming request safe to write to disk? Feels like there are security concerns there.

TylerHelmuth avatar May 07 '24 19:05 TylerHelmuth

We'd only have to persist the request headers, which should be fine. But if we document that this isn't a supported operation, I'm fine as well.

jpkrohling avatar May 08 '24 11:05 jpkrohling

Thanks for moving and bringing attention to this. Let me know if you need more example config to reproduce the issue.

lindeskar avatar May 08 '24 12:05 lindeskar