fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Support dynamic input for building up OLTP Resource Attributes (possible Lua use case?)

Open nefilim opened this issue 4 months ago • 0 comments

For context, consider a host running:

  • Docker
    • configured with the Fluentd logging driver
  • multiple containerized services
  • Fluent Bit
    • Enriches the log with contextual information of the local environment (eg container_id)
    • full pipeline configuration at the end

Of interest are the containerized services producing logs with log4j/logback to stdout.

Here is an example of data flowing from service to docker and through the Fluent Bit pipeline and where the area of opportunity is. Initial log message produced by a service:

{
    "@timestamp": "2025-06-13T17:23:19.876365974-06:00",
    "@version": "1",
    "message": "latency is 3ms",
    "logger": "org.home4s.lutron.leap.LutronLEAPStream",
    "thread": "io-compute-0",
    "severity": "INFO",
    "level_value": 20000,
    "home4s.bridge": "Lutron",
    "service.name": "home4s",
    "service.version": "0.10",
    "service.namespace": "homelab"
}

Docker Fluentd driver restructures and decorates the log:

{
    "log": "{\"@timestamp\":\"2025-06-03T04:34:16.179429011Z\",\"@version\":\"1\",\"message\":\"latency is 3ms\",\"logger\":\"org.home4s.lutron.leap.LutronLEAPStream\",\"thread\":\"io-compute-0\",\"severity\":\"INFO\",\"level_value\":20000}",
    "container_id": "3286f4562f95063a7ead6b3ca46c895c2eabded97e1d81c5ed52fd545fd5828e",
    "container_name": "/home4s",
    "source": "stdout"
}

In the Fluent Bit pipeline use a JSON parser as processor with Key_Name "log" to parse the Event, along with Reserve_Data: true to maintain container metadata (modify to remove source).

{
    "@timestamp": "2025-06-13T17:23:19.876365974-06:00",
    "@version": "1",
    "message": "latency is 3ms",
    "logger": "org.home4s.lutron.leap.LutronLEAPStream",
    "thread": "io-compute-0",
    "severity": "INFO",
    "level_value": 20000,
    "home4s.bridge": "Lutron",
    "service.name": "home4s",
    "service.version": "0.10",
    "service.namespace": "homelab",
    "container_id": "3286f4562f95063a7ead6b3ca46c895c2eabded97e1d81c5ed52fd545fd5828e",
    "container_name": "/home4s"
}

Next use the opentelemetry_envelope processor to restructure into OTLP structure along with the content_modifier to populate the OTLP Resource Attributes:

  - name: opentelemetry_envelope
  - name: content_modifier
    context: otel_resource_attributes
    action: upsert
    key: service.name
    value: event.attributes.service.name # <===== THIS IS THE PROBLEM

and then finally out through the opentelemetry output, resulting in the following (Fluent format):

Jun 14 21:08:18 home4s fluent-bit[3898584]: GROUP METADATA :
Jun 14 21:08:18 home4s fluent-bit[3898584]: {"schema"=>"otlp", "resource_id"=>0, "scope_id"=>0}
Jun 14 21:08:18 home4s fluent-bit[3898584]: GROUP ATTRIBUTES :
Jun 14 21:08:18 home4s fluent-bit[3898584]: {"resource"=>{"attributes"=>{"service.name"=>"event.attributes.service.name"}}, "scope"=>{}}
Jun 14 21:08:18 home4s fluent-bit[3898584]: [25] home4s:0.10: [[1749935298.081045245, {}], {"@timestamp"=>"2025-06-13T17:23:19.876365974-06:00", "@version"=>"1", "message"=>"latency is 3ms", "logger"=>"org.home4s.lutron.leap.LutronLEAPStream", "thread"=>"io-compute-1", "severity"=>"INFO", "level_value"=>20000, "home4s.bridge"=>"Lutron", "service.name"=>"home4s", "service.version"=>"0.10", "service.namespace"=>"homelab", "container_id"=>"3286f4562f95063a7ead6b3ca46c895c2eabded97e1d81c5ed52fd545fd5828e", "container_name"=>"/home4s"}]

The feature request centres around allowing support for a dynamic lookup for the value of content_modifier, in this case, specifically from the Log Attributes (supplied in the original log).

Another possible approach would be to create a Tag for each service in the docker logging driver config and a corresponding content_modifier + match for each one.

This is not ideal because the configuration needs to manually adjusted every time a service is added or removed. More ideally, Fluent Bit could be configured once and dynamically scale as containerized services are added and removed. Allowing dynamic lookups in the value for content_modifier should make this possible.

Finally just a question, I never see any metadata in the output records, how is metadata populated in the pipeline?

complete pipeline:

service:
  flush: 1
  log_level: info
  parsers_file: parsers.conf

pipeline:
  inputs:
    - name: forward
      listen: 0.0.0.0
      port: 24224

      processors:
        logs:
          - name: parser
            match: '*'
            parser: log4j_json
            key_name: log
            reserve_data: 'true' # keep the docker log driver fields like 'container_id', 'container_name'
          - name: modify
            match: '*'
            Remove: source # .. but remove 'source', it's always 'stdout'
          - name: opentelemetry_envelope

          - name: content_modifier
            context: otel_resource_attributes
            action: upsert
            key: service.name
            value: log_record.attributes.service.name # <===== THIS IS THE PROBLEM

  outputs:
    - name: opentelemetry
      match: '*'
      host: otel-collector.saxonmt.casa
      port: 443
      tls: 'On'
      log_response_payload: true
      logs_body_key: message
      logs_severity_text_message_key: severity
      logs_severity_number_message_key: level_value
      logs_body_key_attributes: true
    - name: stdout
#      format: json_lines
      match: '*'

nefilim avatar Jun 14 '25 21:06 nefilim