fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

opentelemetry output not sending trace_id and span_id for logs

Open raedkit opened this issue 6 months ago • 5 comments

Bug Report

Describe the bug opentelemetry output does not send correctly trace_id and span_id

To Reproduce

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    info
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_Port    8888
    storage.path  /tmp/storage
    storage.sync  normal
    storage.checksum  off
    storage.backlog.mem_limit  5M
#    Parsers_File /fluent-bit/etc/parsers.conf

# OpenTelemetry Input for gRPC
[INPUT]
    Name        opentelemetry
    Listen      0.0.0.0
    Port        4317
    Buffer_Max_Size  10MB
    Buffer_Chunk_Size  1MB
    Log_Level   debug

# Debug filter to print all records
[FILTER]
    Name    stdout
    Match   *

# Route logs to Loki using OTLP over HTTP
[OUTPUT]
    Name        opentelemetry
    Match       v1_logs
    Host        loki
    Port        3100
    Logs_uri    /otlp/v1/logs
    grpc        false
    http2       false
    Log_response_payload true
    Log_Level   trace
    Workers     4
    tls         Off
    tls.verify  Off
    logs_trace_id_metadata_key   trace_id
    logs_span_id_metadata_key    span_id

when ingesting logs here an example input :

2025-06-05T15:52:05.646989193Z [0] v1_logs: [1749138711.18446744072203624508, {"otlp"=>{"observed_timestamp"=>1749138711578345763, "timestamp"=>1749138711578090835, "severity_number"=>9, "severity_text"=>"INFO", "trace_id"=>"1\"\x0aK\xad\xd9\x11)\xb6\xda\xcfg\x1dp\xefE", "span_id"=>"h\x05\xde0|(\x04\xea", "trace_flags"=>1}}, {"message"=>"API Gateway received provision request for client: unknown, traceId: 31220a4badd91129b6dacf671d70ef45"}]

as you could see the trace_id and span_id have been ingested correctly by the opentelemetry input but as you will see later all resources metadata and structured metadata have been ingested correctly except for the trace_id and span_id

Expected behavior trace_id and span_id should be sent correctly to loki using opentelemetry output for logs. I tested for metrics and traces and Prometheus and Tempo were able to ingest them correctly and to have trace_id and span_id without any difficulty.

Screenshots

Image

Your Environment

  • Version used: 4.0.3
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?): docker compose
  • Server type and version: local
  • Operating System and version: Windows 10
  • Filters and plugins:

Additional context

  • i'm trying to create an observability data plateform where i'm having data pipeline from Spring Boot applications to Fluent Bit as OpenTelemetry collector which later forward to loki, tempo and prometheus in otlp format.
  • Everything is working correctly except for the trace_id and span_id for the logs.

raedkit avatar Jun 11 '25 14:06 raedkit

questions:

if you remove the following lines does it work ?

    logs_trace_id_metadata_key   trace_id
    logs_span_id_metadata_key    span_id

note that the trace_id is part of the metadata under the otlp key, so it needs to be accessed through the pattern $otlp['trace_id'] , similar thing for span_id with $otlp['span_id']

basically:

    logs_trace_id_metadata_key   $otlp['trace_id']
    logs_span_id_metadata_key    $otlp['span_id']

edsiper avatar Jun 11 '25 22:06 edsiper

Thank you @edsiper for your help. I spent the day retesting again all the combinations without success. I also tested all the versions of Fluent Bit from 4.0.3 to 3.2.7 without success. For me it seems to be a bug.. If I could help providing any more inputs or any scenarios to reproduce the issue, please don't hesitate to tell me what's needed. I would be more than happy to contribute to the resolution of this issue.

raedkit avatar Jun 12 '25 16:06 raedkit

@raedkit do you have a simple OTLP JSON log I can try with your config ?

edsiper avatar Jun 12 '25 17:06 edsiper

@raedkit do you have a simple OTLP JSON log I can try with your config ?

Yes sure, please find it below. But as I explained the record is then sent via gRPC (so i suppose in protobuff and not in JSON) as an input to the data ingestion pipeline in fluent bit :

{
    "resource":
    {
        "attributes":
        [
            {
                "key": "deployment.environment",
                "value":
                {
                    "stringValue": "local"
                }
            },
            {
                "key": "service.name",
                "value":
                {
                    "stringValue": "billing-system"
                }
            },
            {
                "key": "service.namespace",
                "value":
                {
                    "stringValue": "XXX"
                }
            },
            {
                "key": "service.version",
                "value":
                {
                    "stringValue": "1.0.0"
                }
            },
            {
                "key": "telemetry.sdk.language",
                "value":
                {
                    "stringValue": "java"
                }
            },
            {
                "key": "telemetry.sdk.name",
                "value":
                {
                    "stringValue": "opentelemetry"
                }
            },
            {
                "key": "telemetry.sdk.version",
                "value":
                {
                    "stringValue": "1.43.0"
                }
            }
        ]
    },
    "scopeLogs":
    [
        {
            "scope":
            {
                "name": "fr.XX.billingsystem.controller.BillingController",
                "attributes":
                []
            },
            "logRecords":
            [
                {
                    "timeUnixNano": "1749825837720374499",
                    "observedTimeUnixNano": "1749825837720693393",
                    "severityNumber": 9,
                    "severityText": "INFO",
                    "body":
                    {
                        "stringValue": "Successfully processed billing provision request for client: unknown, traceId: c413d5b5fea3657325ff663320c7fd10"
                    },
                    "attributes":
                    [],
                    "flags": 1,
                    "traceId": "c413d5b5fea3657325ff663320c7fd10",
                    "spanId": "77651bbd5ce0ce99"
                }
            ]
        }
    ]
}

raedkit avatar Jun 13 '25 14:06 raedkit

@edsiper sorry to bother you, but have you been able to reproduce the issue on your side ?

raedkit avatar Jun 16 '25 13:06 raedkit

Please find below a complete scenario to reproduce this bug using grpCurl :

Fluent Bit Config

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    info
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_Port    8888
    storage.path  /tmp/storage
    storage.sync  normal
    storage.checksum  off
    storage.backlog.mem_limit  5M
#    Parsers_File /fluent-bit/etc/parsers.conf

# OpenTelemetry Input for gRPC
[INPUT]
    Name        opentelemetry
    Listen      0.0.0.0
    Port        4317
    Buffer_Max_Size  10MB
    Buffer_Chunk_Size  1MB
    Log_Level   debug

# Debug filter to print all records
[FILTER]
    Name    stdout
    Match   *

# Route logs to Loki using OTLP over HTTP
[OUTPUT]
    Name        opentelemetry
    Match       v1_logs
    Host        loki
    Port        3100
    Logs_uri    /otlp/v1/logs
    grpc        false
    http2       false
    Log_response_payload true
    Workers     4
    tls         Off
    tls.verify  Off
    logs_trace_id_metadata_key   $otlp['trace_id']
    logs_span_id_metadata_key    $otlp['span_id']

And here the script to test in powershell format :

<#
.SYNOPSIS
    Sends a dynamically generated OpenTelemetry log record to a gRPC collector using grpcurl.
.DESCRIPTION
    This script dynamically generates a log payload with the current timestamp and a unique
    request ID in the message body. It then converts this payload to JSON and pipes it
to grpcurl to send to an OpenTelemetry collector.
.PARAMETER CollectorEndpoint
    The address and port of the OpenTelemetry Collector's gRPC receiver.
    Defaults to 'localhost:4317'.
.EXAMPLE
    .\Send-OtelLog.ps1 -Verbose
    Sends a new, unique log to the default collector at 'localhost:4317'.
.EXAMPLE
    .\Send-OtelLog.ps1 -CollectorEndpoint "my-collector.example.com:4317"
    Sends a new, unique log to a custom collector endpoint.
#>
[CmdletBinding()]
param(
    [Parameter(Mandatory=$false, HelpMessage="The endpoint of the OpenTelemetry gRPC collector.")]
    [string]$CollectorEndpoint = "localhost:4317"
)

# --- SCRIPT BODY ---

# 1. Prerequisite Checks
if (-not (Get-Command grpcurl.exe -ErrorAction SilentlyContinue)) {
    Write-Error "grpcurl.exe not found. Please ensure it is installed and its location is in your PATH environment variable."
    return
}
$protoImportPath = "./opentelemetry-proto"
if (-not (Test-Path -Path $protoImportPath -PathType Container)) {
    Write-Error "The required proto import path '$protoImportPath' was not found in the current directory."
    return
}

# 2. Generate Dynamic Values for the Payload

# Get the Unix epoch start time as a DateTimeOffset object.
$unixEpoch = [datetimeoffset]'1970-01-01 00:00:00Z'

# Calculate the current time in nanoseconds since the epoch.
# .NET Ticks are 100-nanosecond intervals, so we multiply by 100.
$timeUnixNano = ([datetimeoffset]::UtcNow - $unixEpoch).Ticks * 100

# Simulate a slight processing delay for the observed time by adding a tiny random number of nanoseconds.
$observedTimeUnixNano = $timeUnixNano + (Get-Random -Minimum 1000 -Maximum 50000)

# Generate a unique request ID for the log message.
$requestId = Get-Random -Minimum 10000 -Maximum 99999
$logMessage = "Successfully processed billing provision request #${requestId} for client: unknown, traceId: c413d5b5fea3657325ff663320c7fd10"


# 3. Build the Payload as a PowerShell Object
# This is a robust way to create complex JSON structures.
Write-Verbose "Generating dynamic payload..."
$payloadObject = [pscustomobject]@{
    resource_logs = @(
        [pscustomobject]@{
            resource   = [pscustomobject]@{
                attributes = @(
                    @{ key = 'deployment.environment'; value = @{ stringValue = 'local' } }
                    @{ key = 'service.name'; value = @{ stringValue = 'billing-system' } }
                    @{ key = 'service.namespace'; value = @{ stringValue = 'XXX' } }
                    @{ key = 'service.version'; value = @{ stringValue = '1.0.0' } }
                    @{ key = 'telemetry.sdk.language'; value = @{ stringValue = 'java' } }
                    @{ key = 'telemetry.sdk.name'; value = @{ stringValue = 'opentelemetry' } }
                    @{ key = 'telemetry.sdk.version'; value = @{ stringValue = '1.43.0' } }
                )
            }
            scopeLogs  = @(
                [pscustomobject]@{
                    scope      = @{ name = 'fr.XX.billingsystem.controller.BillingController'; attributes = @() }
                    logRecords = @(
                        [pscustomobject]@{
                            # Timestamps must be strings in JSON.
                            timeUnixNano         = $timeUnixNano.ToString()
                            observedTimeUnixNano = $observedTimeUnixNano.ToString()
                            severityNumber       = 9
                            severityText         = 'INFO'
                            # The body contains our dynamic message.
                            body                 = @{ stringValue = $logMessage }
                            attributes           = @()
                            flags                = 1
                            traceId              = 'c413d5b5fea3657325ff663320c7fd10'
                            spanId               = '77651bbd5ce0ce99'
                        }
                    )
                }
            )
        }
    )
}

# Convert the PowerShell object to a JSON string.
# -Depth 10 is crucial to ensure all nested levels are converted.
$jsonPayload = $payloadObject | ConvertTo-Json -Depth 10 -Compress


# 4. Execute the Command
try {
    Write-Verbose "Attempting to send log to OTel Collector at $CollectorEndpoint..."
    
    $grpcurlArgs = @(
        "-plaintext",
        "-v",
        "-d", "@",
        "-proto", "opentelemetry-proto/opentelemetry/proto/collector/logs/v1/logs_service.proto",
        "-import-path", $protoImportPath,
        $CollectorEndpoint,
        "opentelemetry.proto.collector.logs.v1.LogsService/Export"
    )

    # Pipe the generated JSON payload to the grpcurl command.
    $jsonPayload | grpcurl.exe @grpcurlArgs

    if ($LASTEXITCODE -ne 0) {
        throw "grpcurl.exe exited with a non-zero exit code: $LASTEXITCODE. Check the verbose output above for errors."
    }

    Write-Host "Successfully sent log to OpenTelemetry Collector." -ForegroundColor Green
    Write-Host "Log Message Sent: `"$logMessage`"" -ForegroundColor Cyan
}
catch {
    Write-Error "An error occurred while executing the grpcurl command: $($_.Exception.Message)"
}

And the same script for testing in shell format :

#!/usr/bin/env bash

# Exit immediately if a command exits with a non-zero status.
set -e
# Treat unset variables as an error.
set -u
# Pipes will fail if any command in the chain fails.
set -o pipefail

# --- Configuration ---

# Set the Collector Endpoint. Use the first command-line argument ($1) if it's provided,
# otherwise, fall back to the default 'localhost:4317'.
COLLECTOR_ENDPOINT=${1:-"localhost:4317"}
PROTO_IMPORT_PATH="./opentelemetry-proto"
PROTO_FILE="${PROTO_IMPORT_PATH}/opentelemetry/proto/collector/logs/v1/logs_service.proto"
GRPC_SERVICE="opentelemetry.proto.collector.logs.v1.LogsService/Export"

# --- 1. Prerequisite Checks ---

# Check if grpcurl is installed and in the PATH.
if ! command -v grpcurl &> /dev/null; then
    echo "Error: grpcurl command not found." >&2
    echo "Please install it and ensure it's in your PATH." >&2
    exit 1
fi

# Check if the proto directory exists.
if [ ! -d "$PROTO_IMPORT_PATH" ]; then
    echo "Error: The required proto import path '$PROTO_IMPORT_PATH' was not found in the current directory." >&2
    exit 1
fi

# --- 2. Generate Dynamic Values ---

echo "Generating dynamic payload..."

# Generate timestamps in nanoseconds since the Unix epoch.
# NOTE: This uses GNU `date`. On macOS, you might need to install `coreutils` (`brew install coreutils`)
# and use `gdate` instead of `date`.
# Example for macOS: TIME_UNIX_NANO=$(gdate +%s%N)
TIME_UNIX_NANO=$(date +%s%N)
# Simulate a slight processing delay for the observed time.
OBSERVED_TIME_UNIX_NANO=$((TIME_UNIX_NANO + (RANDOM * 1000))) # Add some random nanoseconds

# Generate a unique request ID for the log message.
REQUEST_ID=$((RANDOM % 90000 + 10000)) # Generates a 5-digit random number
LOG_MESSAGE="Successfully processed billing provision request #${REQUEST_ID} for client: unknown, traceId: c413d5b5fea3657325ff663320c7fd10"


# --- 3. Build the JSON Payload ---

# We use a "here document" (cat <<EOF) to construct the multi-line JSON string.
# Variables like ${TIME_UNIX_NANO} will be substituted with their values.
# Note: For a more robust solution that is less prone to syntax errors, consider using `jq`.
# However, this method has no external dependencies beyond the shell itself.
JSON_PAYLOAD=$(cat <<EOF
{
    "resource_logs": [
        {
            "resource": {
                "attributes": [
                    { "key": "deployment.environment", "value": { "stringValue": "local" } },
                    { "key": "service.name", "value": { "stringValue": "billing-system" } },
                    { "key": "service.namespace", "value": { "stringValue": "XXX" } },
                    { "key": "service.version", "value": { "stringValue": "1.0.0" } },
                    { "key": "telemetry.sdk.language", "value": { "stringValue": "shell" } },
                    { "key": "telemetry.sdk.name", "value": { "stringValue": "opentelemetry" } },
                    { "key": "telemetry.sdk.version", "value": { "stringValue": "1.0.0" } }
                ]
            },
            "scopeLogs": [
                {
                    "scope": {
                        "name": "com.example.billing.script",
                        "attributes": []
                    },
                    "logRecords": [
                        {
                            "timeUnixNano": "${TIME_UNIX_NANO}",
                            "observedTimeUnixNano": "${OBSERVED_TIME_UNIX_NANO}",
                            "severityNumber": 9,
                            "severityText": "INFO",
                            "body": {
                                "stringValue": "${LOG_MESSAGE}"
                            },
                            "attributes": [],
                            "flags": 1,
                            "traceId": "c413d5b5fea3657325ff663320c7fd10",
                            "spanId": "77651bbd5ce0ce99"
                        }
                    ]
                }
            ]
        }
    ]
}
EOF
)

# --- 4. Execute the Command ---

echo "Attempting to send log to OTel Collector at ${COLLECTOR_ENDPOINT}..."

# Pipe the JSON payload into grpcurl.
# The `-d @` flag tells grpcurl to read the request body from stdin.
# Using 'printf' is safer than 'echo' for piping arbitrary data.
printf "%s" "$JSON_PAYLOAD" | grpcurl \
    -plaintext \
    -v \
    -d @ \
    -proto "$PROTO_FILE" \
    -import-path "$PROTO_IMPORT_PATH" \
    "$COLLECTOR_ENDPOINT" \
    "$GRPC_SERVICE"

# Add a newline for cleaner terminal output after grpcurl's status messages.
echo ""
echo -e "\e[32mSuccessfully sent log to OpenTelemetry Collector.\e[0m"
echo -e "\e[36mLog Message Sent: \"${LOG_MESSAGE}\"\e[0m"

And here loki's configuration :

auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
  replication_factor: 1

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  wal:
    enabled: true
    dir: /loki/wal

schema_config:
  configs:
    - from: 2021-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks
  tsdb_shipper:
    active_index_directory: /loki/tsdb/active
    cache_location: /loki/tsdb/cache
    cache_ttl: 24h

limits_config:
  volume_enabled: true
  retention_period: 168h
  allow_structured_metadata: true
#  otlp_config:
#    resource_attributes:
#      attributes_config:
#        - action: structured_metadata
#          regex: ".*"        # keep every resource attribute
#    scope_attributes:
#      - action: structured_metadata
#        regex: ".*"          # keep every scope attribute
#    log_attributes:
#      - action: structured_metadata
#        regex: ".*"          # keep every log attribute

compactor:
  working_directory: /loki/compactor

As you could see all the structured metadata are stored in Loki except traceId and spanId :

Image

raedkit avatar Jun 23 '25 15:06 raedkit

I believe I'm facing the same issue. My setup includes Fluent Bit version 4.0.3, installed using Helm chart version 0.49.1. It’s running on MicroK8s version 1.32.3 (revision 8148) on Ubuntu 24.04.2.

My Fluent Bit configuration is very basic:

[SERVICE]
    Daemon Off
    Flush {{ .Values.flush }}
    Log_Level {{ .Values.logLevel }}
    Parsers_File /fluent-bit/etc/parsers.conf
    Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
    HTTP_Server On
    HTTP_Listen 0.0.0.0
    HTTP_Port {{ .Values.metricsPort }}
    Health_Check On

[INPUT]
    Name opentelemetry
    Listen 0.0.0.0
    Port 4318

[OUTPUT]
    Name        opentelemetry
    Match       v1_logs
    Host        loki
    Port        3100
    Logs_uri    /otlp/v1/logs
    grpc        false
    http2       false
    tls         Off
    tls.verify  Off

I created a super simple web application using Java 21 and Spring Boot 3.5.3 with spring-boot-starter-web as the only dependency. The application exposes a single HTTP endpoint, and when it's hit, it logs a basic message. I'm running it using opentelemetry-javaagent version 2.17.0 with SDK version 1.51.0, and I'm not setting any additional OpenTelemetry configuration.

I’m using the same pipeline as @raedkit : Spring Boot application → Fluent Bit → Loki → Grafana. The logs flow through correctly.

When I invoke the endpoint, the generated log message successfully appears in the dashboard. The log includes several labels like observed_timestamp, scope_name, service_name, severity_text, telemetry_sdk_language, and others - but it's missing both trace_id and span_id.

But if I use the exact same setup but send logs to an OpenTelemetry Collector instead of to Fluent Bit, then the trace_id and span_id are present in the dashboard as expected. I didn't set up any custom logic around trace_id and span_id in OpenTelemetry Collector.

I’ve tried using the suggested logs_trace_id_metadata_key and similar configuration options, but unfortunately, none of them resolved the issue.

stojsavljevic avatar Jun 30 '25 08:06 stojsavljevic

WIP

https://github.com/fluent/fluent-bit/pull/10548

edsiper avatar Jul 06 '25 19:07 edsiper

fixed in #10548

edsiper avatar Jul 08 '25 17:07 edsiper