fluent-bit
fluent-bit copied to clipboard
opentelemetry output not sending trace_id and span_id for logs
Bug Report
Describe the bug opentelemetry output does not send correctly trace_id and span_id
To Reproduce
[SERVICE]
Flush 1
Daemon Off
Log_Level info
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 8888
storage.path /tmp/storage
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 5M
# Parsers_File /fluent-bit/etc/parsers.conf
# OpenTelemetry Input for gRPC
[INPUT]
Name opentelemetry
Listen 0.0.0.0
Port 4317
Buffer_Max_Size 10MB
Buffer_Chunk_Size 1MB
Log_Level debug
# Debug filter to print all records
[FILTER]
Name stdout
Match *
# Route logs to Loki using OTLP over HTTP
[OUTPUT]
Name opentelemetry
Match v1_logs
Host loki
Port 3100
Logs_uri /otlp/v1/logs
grpc false
http2 false
Log_response_payload true
Log_Level trace
Workers 4
tls Off
tls.verify Off
logs_trace_id_metadata_key trace_id
logs_span_id_metadata_key span_id
when ingesting logs here an example input :
2025-06-05T15:52:05.646989193Z [0] v1_logs: [1749138711.18446744072203624508, {"otlp"=>{"observed_timestamp"=>1749138711578345763, "timestamp"=>1749138711578090835, "severity_number"=>9, "severity_text"=>"INFO", "trace_id"=>"1\"\x0aK\xad\xd9\x11)\xb6\xda\xcfg\x1dp\xefE", "span_id"=>"h\x05\xde0|(\x04\xea", "trace_flags"=>1}}, {"message"=>"API Gateway received provision request for client: unknown, traceId: 31220a4badd91129b6dacf671d70ef45"}]
as you could see the trace_id and span_id have been ingested correctly by the opentelemetry input but as you will see later all resources metadata and structured metadata have been ingested correctly except for the trace_id and span_id
Expected behavior
trace_id and span_id should be sent correctly to loki using opentelemetry output for logs. I tested for metrics and traces and Prometheus and Tempo were able to ingest them correctly and to have trace_id and span_id without any difficulty.
Screenshots
Your Environment
- Version used: 4.0.3
- Configuration:
- Environment name and version (e.g. Kubernetes? What version?): docker compose
- Server type and version: local
- Operating System and version: Windows 10
- Filters and plugins:
Additional context
- i'm trying to create an observability data plateform where i'm having data pipeline from Spring Boot applications to Fluent Bit as OpenTelemetry collector which later forward to loki, tempo and prometheus in otlp format.
- Everything is working correctly except for the trace_id and span_id for the logs.
questions:
if you remove the following lines does it work ?
logs_trace_id_metadata_key trace_id
logs_span_id_metadata_key span_id
note that the trace_id is part of the metadata under the otlp key, so it needs to be accessed through the pattern $otlp['trace_id'] , similar thing for span_id with $otlp['span_id']
basically:
logs_trace_id_metadata_key $otlp['trace_id']
logs_span_id_metadata_key $otlp['span_id']
Thank you @edsiper for your help. I spent the day retesting again all the combinations without success. I also tested all the versions of Fluent Bit from 4.0.3 to 3.2.7 without success. For me it seems to be a bug.. If I could help providing any more inputs or any scenarios to reproduce the issue, please don't hesitate to tell me what's needed. I would be more than happy to contribute to the resolution of this issue.
@raedkit do you have a simple OTLP JSON log I can try with your config ?
@raedkit do you have a simple OTLP JSON log I can try with your config ?
Yes sure, please find it below. But as I explained the record is then sent via gRPC (so i suppose in protobuff and not in JSON) as an input to the data ingestion pipeline in fluent bit :
{
"resource":
{
"attributes":
[
{
"key": "deployment.environment",
"value":
{
"stringValue": "local"
}
},
{
"key": "service.name",
"value":
{
"stringValue": "billing-system"
}
},
{
"key": "service.namespace",
"value":
{
"stringValue": "XXX"
}
},
{
"key": "service.version",
"value":
{
"stringValue": "1.0.0"
}
},
{
"key": "telemetry.sdk.language",
"value":
{
"stringValue": "java"
}
},
{
"key": "telemetry.sdk.name",
"value":
{
"stringValue": "opentelemetry"
}
},
{
"key": "telemetry.sdk.version",
"value":
{
"stringValue": "1.43.0"
}
}
]
},
"scopeLogs":
[
{
"scope":
{
"name": "fr.XX.billingsystem.controller.BillingController",
"attributes":
[]
},
"logRecords":
[
{
"timeUnixNano": "1749825837720374499",
"observedTimeUnixNano": "1749825837720693393",
"severityNumber": 9,
"severityText": "INFO",
"body":
{
"stringValue": "Successfully processed billing provision request for client: unknown, traceId: c413d5b5fea3657325ff663320c7fd10"
},
"attributes":
[],
"flags": 1,
"traceId": "c413d5b5fea3657325ff663320c7fd10",
"spanId": "77651bbd5ce0ce99"
}
]
}
]
}
@edsiper sorry to bother you, but have you been able to reproduce the issue on your side ?
Please find below a complete scenario to reproduce this bug using grpCurl :
Fluent Bit Config
[SERVICE]
Flush 1
Daemon Off
Log_Level info
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 8888
storage.path /tmp/storage
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 5M
# Parsers_File /fluent-bit/etc/parsers.conf
# OpenTelemetry Input for gRPC
[INPUT]
Name opentelemetry
Listen 0.0.0.0
Port 4317
Buffer_Max_Size 10MB
Buffer_Chunk_Size 1MB
Log_Level debug
# Debug filter to print all records
[FILTER]
Name stdout
Match *
# Route logs to Loki using OTLP over HTTP
[OUTPUT]
Name opentelemetry
Match v1_logs
Host loki
Port 3100
Logs_uri /otlp/v1/logs
grpc false
http2 false
Log_response_payload true
Workers 4
tls Off
tls.verify Off
logs_trace_id_metadata_key $otlp['trace_id']
logs_span_id_metadata_key $otlp['span_id']
And here the script to test in powershell format :
<#
.SYNOPSIS
Sends a dynamically generated OpenTelemetry log record to a gRPC collector using grpcurl.
.DESCRIPTION
This script dynamically generates a log payload with the current timestamp and a unique
request ID in the message body. It then converts this payload to JSON and pipes it
to grpcurl to send to an OpenTelemetry collector.
.PARAMETER CollectorEndpoint
The address and port of the OpenTelemetry Collector's gRPC receiver.
Defaults to 'localhost:4317'.
.EXAMPLE
.\Send-OtelLog.ps1 -Verbose
Sends a new, unique log to the default collector at 'localhost:4317'.
.EXAMPLE
.\Send-OtelLog.ps1 -CollectorEndpoint "my-collector.example.com:4317"
Sends a new, unique log to a custom collector endpoint.
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$false, HelpMessage="The endpoint of the OpenTelemetry gRPC collector.")]
[string]$CollectorEndpoint = "localhost:4317"
)
# --- SCRIPT BODY ---
# 1. Prerequisite Checks
if (-not (Get-Command grpcurl.exe -ErrorAction SilentlyContinue)) {
Write-Error "grpcurl.exe not found. Please ensure it is installed and its location is in your PATH environment variable."
return
}
$protoImportPath = "./opentelemetry-proto"
if (-not (Test-Path -Path $protoImportPath -PathType Container)) {
Write-Error "The required proto import path '$protoImportPath' was not found in the current directory."
return
}
# 2. Generate Dynamic Values for the Payload
# Get the Unix epoch start time as a DateTimeOffset object.
$unixEpoch = [datetimeoffset]'1970-01-01 00:00:00Z'
# Calculate the current time in nanoseconds since the epoch.
# .NET Ticks are 100-nanosecond intervals, so we multiply by 100.
$timeUnixNano = ([datetimeoffset]::UtcNow - $unixEpoch).Ticks * 100
# Simulate a slight processing delay for the observed time by adding a tiny random number of nanoseconds.
$observedTimeUnixNano = $timeUnixNano + (Get-Random -Minimum 1000 -Maximum 50000)
# Generate a unique request ID for the log message.
$requestId = Get-Random -Minimum 10000 -Maximum 99999
$logMessage = "Successfully processed billing provision request #${requestId} for client: unknown, traceId: c413d5b5fea3657325ff663320c7fd10"
# 3. Build the Payload as a PowerShell Object
# This is a robust way to create complex JSON structures.
Write-Verbose "Generating dynamic payload..."
$payloadObject = [pscustomobject]@{
resource_logs = @(
[pscustomobject]@{
resource = [pscustomobject]@{
attributes = @(
@{ key = 'deployment.environment'; value = @{ stringValue = 'local' } }
@{ key = 'service.name'; value = @{ stringValue = 'billing-system' } }
@{ key = 'service.namespace'; value = @{ stringValue = 'XXX' } }
@{ key = 'service.version'; value = @{ stringValue = '1.0.0' } }
@{ key = 'telemetry.sdk.language'; value = @{ stringValue = 'java' } }
@{ key = 'telemetry.sdk.name'; value = @{ stringValue = 'opentelemetry' } }
@{ key = 'telemetry.sdk.version'; value = @{ stringValue = '1.43.0' } }
)
}
scopeLogs = @(
[pscustomobject]@{
scope = @{ name = 'fr.XX.billingsystem.controller.BillingController'; attributes = @() }
logRecords = @(
[pscustomobject]@{
# Timestamps must be strings in JSON.
timeUnixNano = $timeUnixNano.ToString()
observedTimeUnixNano = $observedTimeUnixNano.ToString()
severityNumber = 9
severityText = 'INFO'
# The body contains our dynamic message.
body = @{ stringValue = $logMessage }
attributes = @()
flags = 1
traceId = 'c413d5b5fea3657325ff663320c7fd10'
spanId = '77651bbd5ce0ce99'
}
)
}
)
}
)
}
# Convert the PowerShell object to a JSON string.
# -Depth 10 is crucial to ensure all nested levels are converted.
$jsonPayload = $payloadObject | ConvertTo-Json -Depth 10 -Compress
# 4. Execute the Command
try {
Write-Verbose "Attempting to send log to OTel Collector at $CollectorEndpoint..."
$grpcurlArgs = @(
"-plaintext",
"-v",
"-d", "@",
"-proto", "opentelemetry-proto/opentelemetry/proto/collector/logs/v1/logs_service.proto",
"-import-path", $protoImportPath,
$CollectorEndpoint,
"opentelemetry.proto.collector.logs.v1.LogsService/Export"
)
# Pipe the generated JSON payload to the grpcurl command.
$jsonPayload | grpcurl.exe @grpcurlArgs
if ($LASTEXITCODE -ne 0) {
throw "grpcurl.exe exited with a non-zero exit code: $LASTEXITCODE. Check the verbose output above for errors."
}
Write-Host "Successfully sent log to OpenTelemetry Collector." -ForegroundColor Green
Write-Host "Log Message Sent: `"$logMessage`"" -ForegroundColor Cyan
}
catch {
Write-Error "An error occurred while executing the grpcurl command: $($_.Exception.Message)"
}
And the same script for testing in shell format :
#!/usr/bin/env bash
# Exit immediately if a command exits with a non-zero status.
set -e
# Treat unset variables as an error.
set -u
# Pipes will fail if any command in the chain fails.
set -o pipefail
# --- Configuration ---
# Set the Collector Endpoint. Use the first command-line argument ($1) if it's provided,
# otherwise, fall back to the default 'localhost:4317'.
COLLECTOR_ENDPOINT=${1:-"localhost:4317"}
PROTO_IMPORT_PATH="./opentelemetry-proto"
PROTO_FILE="${PROTO_IMPORT_PATH}/opentelemetry/proto/collector/logs/v1/logs_service.proto"
GRPC_SERVICE="opentelemetry.proto.collector.logs.v1.LogsService/Export"
# --- 1. Prerequisite Checks ---
# Check if grpcurl is installed and in the PATH.
if ! command -v grpcurl &> /dev/null; then
echo "Error: grpcurl command not found." >&2
echo "Please install it and ensure it's in your PATH." >&2
exit 1
fi
# Check if the proto directory exists.
if [ ! -d "$PROTO_IMPORT_PATH" ]; then
echo "Error: The required proto import path '$PROTO_IMPORT_PATH' was not found in the current directory." >&2
exit 1
fi
# --- 2. Generate Dynamic Values ---
echo "Generating dynamic payload..."
# Generate timestamps in nanoseconds since the Unix epoch.
# NOTE: This uses GNU `date`. On macOS, you might need to install `coreutils` (`brew install coreutils`)
# and use `gdate` instead of `date`.
# Example for macOS: TIME_UNIX_NANO=$(gdate +%s%N)
TIME_UNIX_NANO=$(date +%s%N)
# Simulate a slight processing delay for the observed time.
OBSERVED_TIME_UNIX_NANO=$((TIME_UNIX_NANO + (RANDOM * 1000))) # Add some random nanoseconds
# Generate a unique request ID for the log message.
REQUEST_ID=$((RANDOM % 90000 + 10000)) # Generates a 5-digit random number
LOG_MESSAGE="Successfully processed billing provision request #${REQUEST_ID} for client: unknown, traceId: c413d5b5fea3657325ff663320c7fd10"
# --- 3. Build the JSON Payload ---
# We use a "here document" (cat <<EOF) to construct the multi-line JSON string.
# Variables like ${TIME_UNIX_NANO} will be substituted with their values.
# Note: For a more robust solution that is less prone to syntax errors, consider using `jq`.
# However, this method has no external dependencies beyond the shell itself.
JSON_PAYLOAD=$(cat <<EOF
{
"resource_logs": [
{
"resource": {
"attributes": [
{ "key": "deployment.environment", "value": { "stringValue": "local" } },
{ "key": "service.name", "value": { "stringValue": "billing-system" } },
{ "key": "service.namespace", "value": { "stringValue": "XXX" } },
{ "key": "service.version", "value": { "stringValue": "1.0.0" } },
{ "key": "telemetry.sdk.language", "value": { "stringValue": "shell" } },
{ "key": "telemetry.sdk.name", "value": { "stringValue": "opentelemetry" } },
{ "key": "telemetry.sdk.version", "value": { "stringValue": "1.0.0" } }
]
},
"scopeLogs": [
{
"scope": {
"name": "com.example.billing.script",
"attributes": []
},
"logRecords": [
{
"timeUnixNano": "${TIME_UNIX_NANO}",
"observedTimeUnixNano": "${OBSERVED_TIME_UNIX_NANO}",
"severityNumber": 9,
"severityText": "INFO",
"body": {
"stringValue": "${LOG_MESSAGE}"
},
"attributes": [],
"flags": 1,
"traceId": "c413d5b5fea3657325ff663320c7fd10",
"spanId": "77651bbd5ce0ce99"
}
]
}
]
}
]
}
EOF
)
# --- 4. Execute the Command ---
echo "Attempting to send log to OTel Collector at ${COLLECTOR_ENDPOINT}..."
# Pipe the JSON payload into grpcurl.
# The `-d @` flag tells grpcurl to read the request body from stdin.
# Using 'printf' is safer than 'echo' for piping arbitrary data.
printf "%s" "$JSON_PAYLOAD" | grpcurl \
-plaintext \
-v \
-d @ \
-proto "$PROTO_FILE" \
-import-path "$PROTO_IMPORT_PATH" \
"$COLLECTOR_ENDPOINT" \
"$GRPC_SERVICE"
# Add a newline for cleaner terminal output after grpcurl's status messages.
echo ""
echo -e "\e[32mSuccessfully sent log to OpenTelemetry Collector.\e[0m"
echo -e "\e[36mLog Message Sent: \"${LOG_MESSAGE}\"\e[0m"
And here loki's configuration :
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
wal:
enabled: true
dir: /loki/wal
schema_config:
configs:
- from: 2021-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
tsdb_shipper:
active_index_directory: /loki/tsdb/active
cache_location: /loki/tsdb/cache
cache_ttl: 24h
limits_config:
volume_enabled: true
retention_period: 168h
allow_structured_metadata: true
# otlp_config:
# resource_attributes:
# attributes_config:
# - action: structured_metadata
# regex: ".*" # keep every resource attribute
# scope_attributes:
# - action: structured_metadata
# regex: ".*" # keep every scope attribute
# log_attributes:
# - action: structured_metadata
# regex: ".*" # keep every log attribute
compactor:
working_directory: /loki/compactor
As you could see all the structured metadata are stored in Loki except traceId and spanId :
I believe I'm facing the same issue. My setup includes Fluent Bit version 4.0.3, installed using Helm chart version 0.49.1. It’s running on MicroK8s version 1.32.3 (revision 8148) on Ubuntu 24.04.2.
My Fluent Bit configuration is very basic:
[SERVICE]
Daemon Off
Flush {{ .Values.flush }}
Log_Level {{ .Values.logLevel }}
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port {{ .Values.metricsPort }}
Health_Check On
[INPUT]
Name opentelemetry
Listen 0.0.0.0
Port 4318
[OUTPUT]
Name opentelemetry
Match v1_logs
Host loki
Port 3100
Logs_uri /otlp/v1/logs
grpc false
http2 false
tls Off
tls.verify Off
I created a super simple web application using Java 21 and Spring Boot 3.5.3 with spring-boot-starter-web as the only dependency. The application exposes a single HTTP endpoint, and when it's hit, it logs a basic message. I'm running it using opentelemetry-javaagent version 2.17.0 with SDK version 1.51.0, and I'm not setting any additional OpenTelemetry configuration.
I’m using the same pipeline as @raedkit : Spring Boot application → Fluent Bit → Loki → Grafana. The logs flow through correctly.
When I invoke the endpoint, the generated log message successfully appears in the dashboard. The log includes several labels like observed_timestamp, scope_name, service_name, severity_text, telemetry_sdk_language, and others - but it's missing both trace_id and span_id.
But if I use the exact same setup but send logs to an OpenTelemetry Collector instead of to Fluent Bit, then the trace_id and span_id are present in the dashboard as expected. I didn't set up any custom logic around trace_id and span_id in OpenTelemetry Collector.
I’ve tried using the suggested logs_trace_id_metadata_key and similar configuration options, but unfortunately, none of them resolved the issue.
WIP
https://github.com/fluent/fluent-bit/pull/10548
fixed in #10548