fluent-bit
fluent-bit copied to clipboard
ES-output: messages get duplicated (v1.8.x, v1.9.x)
Bug Report
Describe the bug
We have noticed lots of error messages in the fluent-bit logs about ES document version conflicts with the similar content:
[2022/04/13 02:45:16] [error] [output:es:es.0] error: Output ... version_conflict_engine_exception ...
[2022/04/13 02:45:16] [ warn] [engine] failed to flush chunk '1-1649817915.934638796.flb', retry in 9 seconds: task_id=1, input=tail.0 > output=es.0 (out_id=0)
where the detailed ES response is (json-formatted):
{
"took": 3,
"errors": true,
"items": [
...
{
"create": {
"_index": "app-logs-2022.04.13",
"_type": "doc",
"_id": "7b231272-f4ef-221e-e97e-a42baaab752f",
"_version": 1,
"result": "created",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"_seq_no": 2114324,
"_primary_term": 1,
"status": 201
}
},
{
"create": {
"_index": "app-logs-2022.04.13",
"_type": "doc",
"_id": "7b231272-f4ef-221e-e97e-a42baaab752f",
"status": 409,
"error": {
"type": "version_conflict_engine_exception",
"reason": "[doc][7b231272-f4ef-221e-e97e-a42baaab752f]: version conflict, document already exists (current version [1])",
"index_uuid": "1-BpXVxQS6SCi1EE4DR26Q",
"shard": "0",
"index": "app-logs-2022.04.13"
}
}
}
...
]
}
it may be seen that there are two items in the response with the same document id where the 1st one was successfully written to the ES index where as the 2nd one - discarded due to version conflict.
To Reproduce
We have not discovered any correlations with logs content or origins of the logs. Both plain text and JSON-encoded log messages are being duplicated at least.
Expected behavior
log messages are successfully delivered to the ES without being duplicated.
Your Environment
- Version used:
ES: 6.8.xxx fluent-bit: 1.9.1 (at least 1.8.xxx (up to 1.8.11) is also affected) with Generate_ID option enabled for ES output
- Configuration:
config:
service: |
[SERVICE]
Daemon Off
Flush {{ .Values.flush }}
Log_Level {{ .Values.logLevel }}
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port {{ .Values.metricsPort }}
Health_Check On
## https://docs.fluentbit.io/manual/pipeline/inputs
inputs: |
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser docker, cri
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
# To preserve fluent-bit before v1.6 behaviour set this to On
Read_from_Head Off
## https://docs.fluentbit.io/manual/pipeline/filters
filters: |
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
Buffer_Size 256KB
[FILTER]
Name modify
Match kube.*
Remove @timestamp
Remove time
Remove ts
## https://docs.fluentbit.io/manual/pipeline/outputs
outputs: |
[OUTPUT]
Name es
Match kube.*
Include_Tag_Key On
Host fluent-bit-haproxy.logging-infra.svc.cluster.local.
Port 9200
Logstash_Format On
Logstash_Prefix app-logs
Type doc
Generate_ID On
Replace_Dots On
# Retry_Limit False
Retry_Limit 3
Buffer_Size False
Trace_Error True
## https://docs.fluentbit.io/manual/pipeline/parsers
customParsers: |
[PARSER]
Name docker_no_time
Format json
Time_Keep Off
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
- Environment name and version (e.g. Kubernetes? What version?):
Kubernetes 1.19.16
- Operating System and version:
Oracle Linux 7.9 (RHEL-fork)
Hi @kruftik , I believe I'm seeing the same issue with my stack too.
I'm using data streams in elastic, and I'm wondering if that is a factor. Are you too?
Hi, @chibbert
unfortunately, we don't use data streams
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.