fluent-bit ES-output: messages get duplicated (v1.8.x, v1.9.x)

Bug Report

Describe the bug

We have noticed lots of error messages in the fluent-bit logs about ES document version conflicts with the similar content:

[2022/04/13 02:45:16] [error] [output:es:es.0] error: Output ... version_conflict_engine_exception ...
[2022/04/13 02:45:16] [ warn] [engine] failed to flush chunk '1-1649817915.934638796.flb', retry in 9 seconds: task_id=1, input=tail.0 > output=es.0 (out_id=0)

where the detailed ES response is (json-formatted):

{
  "took": 3,
  "errors": true,
  "items": [
   ...
    {
      "create": {
        "_index": "app-logs-2022.04.13",
        "_type": "doc",
        "_id": "7b231272-f4ef-221e-e97e-a42baaab752f",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 2114324,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "create": {
        "_index": "app-logs-2022.04.13",
        "_type": "doc",
        "_id": "7b231272-f4ef-221e-e97e-a42baaab752f",
        "status": 409,
        "error": {
          "type": "version_conflict_engine_exception",
          "reason": "[doc][7b231272-f4ef-221e-e97e-a42baaab752f]: version conflict, document already exists (current version [1])",
          "index_uuid": "1-BpXVxQS6SCi1EE4DR26Q",
          "shard": "0",
          "index": "app-logs-2022.04.13"
        }
      }
    }
  ...
  ]
}

it may be seen that there are two items in the response with the same document id where the 1st one was successfully written to the ES index where as the 2nd one - discarded due to version conflict.

To Reproduce

We have not discovered any correlations with logs content or origins of the logs. Both plain text and JSON-encoded log messages are being duplicated at least.

Expected behavior

log messages are successfully delivered to the ES without being duplicated.

Your Environment

Version used:

ES: 6.8.xxx fluent-bit: 1.9.1 (at least 1.8.xxx (up to 1.8.11) is also affected) with Generate_ID option enabled for ES output

Configuration:

config:
  service: |
    [SERVICE]
        Daemon Off
        Flush {{ .Values.flush }}
        Log_Level {{ .Values.logLevel }}
        Parsers_File parsers.conf
        Parsers_File custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port {{ .Values.metricsPort }}
        Health_Check On

  ## https://docs.fluentbit.io/manual/pipeline/inputs
  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

        # To preserve fluent-bit before v1.6 behaviour set this to On
        Read_from_Head Off

  ## https://docs.fluentbit.io/manual/pipeline/filters
  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

        Buffer_Size 256KB

    [FILTER]
        Name    modify
        Match   kube.*

        Remove @timestamp
        Remove time
        Remove ts

  ## https://docs.fluentbit.io/manual/pipeline/outputs
  outputs: |
    [OUTPUT]
        Name es
        Match kube.*
        Include_Tag_Key On
        Host fluent-bit-haproxy.logging-infra.svc.cluster.local.
        Port 9200
        Logstash_Format On
        Logstash_Prefix app-logs
        Type doc
        Generate_ID On
        Replace_Dots On

        # Retry_Limit False
        Retry_Limit 3

        Buffer_Size False
        Trace_Error True

  ## https://docs.fluentbit.io/manual/pipeline/parsers
  customParsers: |
    [PARSER]
        Name docker_no_time
        Format json
        Time_Keep Off
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%L

Environment name and version (e.g. Kubernetes? What version?):

Kubernetes 1.19.16

Operating System and version:

Oracle Linux 7.9 (RHEL-fork)

Apr 13 '22 03:04 kruftik

Hi @kruftik , I believe I'm seeing the same issue with my stack too.

I'm using data streams in elastic, and I'm wondering if that is a factor. Are you too?

Jun 21 '22 10:06 chibbert

Hi, @chibbert

unfortunately, we don't use data streams

Jun 23 '22 11:06 kruftik

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Sep 22 '22 02:09 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Sep 28 '22 02:09 github-actions[bot]

fluent-bit fluent-bit copied to clipboard

ES-output: messages get duplicated (v1.8.x, v1.9.x)

Bug Report

fluent-bit
fluent-bit copied to clipboard