Multiline filter issue, the timestamp and metadata of some logs is missing
Bug Report
Describe the bug When using a multiline filter, the timestamp and metadata of some logs is systematically missing. We are logging two different types of messages in an time interval of 10 seconds like this:
Thu Jul 10 13:47:36 UTC 2025 Likely to fail
Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T13:47:36+00:00 should be ok
Thu Jul 10 13:47:46 UTC 2025 Likely to fail
Thu Jul 10 13:47:56 UTC 2025 Likely to fail
Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T13:48:16+00:00 should be ok
Thu Jul 10 13:48:26 UTC 2025 Likely to fail
Thu Jul 10 13:48:36 UTC 2025 Likely to fail
Thu Jul 10 13:48:46 UTC 2025 Likely to fail
After logging the "Likely to fail" message twice, the above block is repeatedly logged in a loop (1x should be ok, followed by 3x Likley to fail).
The corresponding logs copied from the kubernetes file:
2025-07-10T08:47:36.927938223-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.929896848-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.931218787-05:00 stdout F 2025-07-10T13:47:36+00:00 should be ok
2025-07-10T08:47:46.936722867-05:00 stdout F Thu Jul 10 13:47:46 UTC 2025 Likely to fail
2025-07-10T08:47:56.942020106-05:00 stdout F Thu Jul 10 13:47:56 UTC 2025 Likely to fail
2025-07-10T08:48:06.946530058-05:00 stdout F Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T08:48:16.951177698-05:00 stdout F 2025-07-10T13:48:16+00:00 should be ok
2025-07-10T08:48:26.955666785-05:00 stdout F Thu Jul 10 13:48:26 UTC 2025 Likely to fail
2025-07-10T08:48:36.960419370-05:00 stdout F Thu Jul 10 13:48:36 UTC 2025 Likely to fail
2025-07-10T08:48:46.965574420-05:00 stdout F Thu Jul 10 13:48:46 UTC 2025 Likely to fail
2025-07-10T08:48:56.970499744-05:00 stdout F 2025-07-10T13:48:56+00:00 should be ok
We have a multiline filter (alias "ml-detect" in below configuration) which matches the first time format ("2025-07-08T15:25:07+00:0"0, message: "should be ok") in the start state and the second time format ("Tue Jul 8 15:25:17 UTC 2025", message "Likely to fail") in the next state.
After the first occurrence of the "should be ok" message, all logs with the message "Likely to fail" have the identical timestamp (e.g. [[1752051270.328705182, {}]) of that previous "should be ok" message, even though they are logged 10 seconds afterwards. In addition, the logs with the message "Likely to fail" do not have any meta data fields like "time", "stream", or "file".
The below logs show an example output, including the wrong timestamp and missing metadata. It shows that the first two "Likely to fail" logs have a correct timestamp and metadata. As soon as the first "should be ok" message is logged, the bug appears (our multiline parser matches the start state) and the following "Likely to fail" logs have the wrong timestamp and no metadata.
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.927938223, {}], {"time"=>"2025-07-10T08:47:36.927938223-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[1] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.929896848, {}], {"time"=>"2025-07-10T08:47:36.929896848-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"time"=>"2025-07-10T08:47:36.931218787-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:47:36+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:56 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:48:06 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"time"=>"2025-07-10T08:48:16.951177698-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:16+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:26 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:36 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155336.970499744, {}], {"time"=>"2025-07-10T08:48:56.970499744-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:56+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
We are using the multiline "cri" parser in the tail input to remove the file headings added by kubernetes. In this way we can use our time formats for the multiline parser.
To Reproduce
- Example log message if applicable: see above
- Steps to reproduce the problem:
- Deploy fluent-bit using below configuration and make sure it has write access to "/var/log/fluent-bit/" for setting up the storage
- Create a testlog pod using the command:
oc run testlog --image <any-linux-image> --command -- /bin/bash -c 'echo $(date) Likely to fail; echo $(date) Likely to fail; while true; do echo $(date -Iseconds) should be ok;sleep 10;echo $(date) Likely to fail;sleep 10;echo $(date) Likely to fail;sleep 10; echo $(date) Likely to fail;sleep 10;done'
It is important that the messages are not collected at once and logged with a delay (in our test example 10 seconds). When using the read from head function in the tail input, the bug will not appear. It needs to wait for the next log line to appear instead of collecting all at once.
Expected behavior
We expect all logs to have the current timestamp and metadata.
Your Environment
- Version used: fluent-bit v3.2.10
- Configuration:
fluent-bit.yaml: |
service:
flush: 2
log_level: info
http_server: on
http_listen: 0.0.0.0
http_port: 8081
health_check: on
hc_errors_count: 1
hc_retry_failure_count: 1
hc_period: 30
storage.path: /var/log/fluent-bit/fluent-bit-cache2
storage.max_chunks_up: 512
storage.metrics: on
storage.sync: full
storage.backlog.mem_limit: 20M
parsers_file: parsers.yaml
pipeline:
inputs:
- name: tail
tag: kube.*
path: /var/log/containers/*testlog*.log
path_key: file
db: /var/log/fluent-bit/fluent-bit2.DB
buffer_chunk_size: 512kb
buffer_max_size: 2mb
mem_buf_limit: 512mb
multiline.parser: cri
skip_long_lines: on
refresh_interval: 20
skip_empty_lines: on
threaded: true
read_from_head: false
rotate_wait: 15
storage.type: filesystem
storage.pause_on_chunks_overlimit: off
filters:
- name: multiline
alias: ml-detect
match: 'kube.var.log.containers.*'
multiline.parser: multiline_detector
multiline.key_content: log
flush_ms: 1000
emitter_mem_buf_limit: 50M
outputs:
- name: stdout
match: '*'
parsers.yaml: |
multiline_parsers:
- name: multiline_detector
type: regex
flush_timeout: 1000
rules:
- state: start_state
regex: '/^(?<log>(((\[?\d+(\-|\/)\d+(\-|\/)\d+.?\d+\:\d+\:\d+(?:[.,]\d{1,6})?(Z|\+\d+)?)\]?).*))/'
next_state: notime_bib_cont
- state: notime_bib_cont
regex: '/^(?<log>(?!\[?\d+(\-|\/)\d+(\-|\/)\d+)(?!BIP\d{3,4}[EI]|\s*Caused\sby\s\[2\]|\{\s*\").*)/'
next_state: notime_bib_cont
- state: notime_bib_cont
regex: '/^\s*(?!.+)'
next_state: notime_bib_cont
- Environment name and version:
- OpenShift Version: 4.17.14
- Kubernetes Version: v1.27.16+03a907c
- Operating System and version: RHEL 8
- Filters and plugins: included in configuration above
Additional context
We are trying to merge multiline logs into one log line using a multiline parser. In this particular example the usage of our multiline parser is not visible, but is shows a bug we detected.
I recall some recent changes around timing so wondering if it is reproducible on the 4.x series?
I was able to reproduce this with fluent-bit version 4.0.4
I used the attached go program to generate kube-like log lines, adjusted the configuration above for filter multiline so that the output file get considered by the Match statement.
Content of generated log file:
cat var/testoutput1.log
2025-07-13T09:37:25.389342+02:00 stdout F Sun Jul 13 07:37:25 UTC 2025 Likely to fail
2025-07-13T09:37:30.390684+02:00 stdout F Sun Jul 13 07:37:30 UTC 2025 Likely to fail
2025-07-13T09:37:35.391783+02:00 stdout F 2025-07-13T07:37:35+00:00 should be ok
2025-07-13T09:37:40.393228+02:00 stdout F Sun Jul 13 07:37:40 UTC 2025 Likely to fail
2025-07-13T09:37:45.393897+02:00 stdout F Sun Jul 13 07:37:45 UTC 2025 Likely to fail
2025-07-13T09:37:50.394664+02:00 stdout F 2025-07-13T07:37:50+00:00 should be ok
2025-07-13T09:37:55.395977+02:00 stdout F Sun Jul 13 07:37:55 UTC 2025 Likely to fail
2025-07-13T09:38:00.396967+02:00 stdout F Sun Jul 13 07:38:00 UTC 2025 Likely to fail
Fluent-bit console log:
./bin/fluent-bit -c ./etc/fluent-bit.yaml
Fluent Bit v4.0.4
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ ___ _____
| ___| | | | | ___ (_) | / || _ |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| || |/' |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| || /| |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |\ |_/ /
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/
[2025/07/13 09:37:16] [ info] [fluent bit] version=4.0.4, commit=59c4d4c089, pid=82387
[2025/07/13 09:37:16] [ info] [storage] ver=1.5.3, type=memory+filesystem, sync=full, checksum=off, max_chunks_up=512
[2025/07/13 09:37:16] [ info] [storage] backlog input plugin: storage_backlog.1
[2025/07/13 09:37:16] [ info] [simd ] disabled
[2025/07/13 09:37:16] [ info] [cmetrics] version=1.0.4
[2025/07/13 09:37:16] [ info] [ctraces ] version=0.6.6
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] initializing
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] storage_strategy='filesystem' (memory + filesystem)
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] multiline core started
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] db: delete unmonitored stale inodes from the database: count=1
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] thread instance initialized
[2025/07/13 09:37:16] [ info] [input:storage_backlog:storage_backlog.1] initializing
[2025/07/13 09:37:16] [ info] [input:storage_backlog:storage_backlog.1] storage_strategy='memory' (memory only)
[2025/07/13 09:37:16] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 19.1M
[2025/07/13 09:37:16] [ info] [filter:multiline:ml-detect] created emitter: emitter_for_ml-detect
[2025/07/13 09:37:16] [ info] [input:emitter:emitter_for_ml-detect] initializing
[2025/07/13 09:37:16] [ info] [input:emitter:emitter_for_ml-detect] storage_strategy='memory' (memory only)
[2025/07/13 09:37:16] [ info] [output:stdout:stdout.0] worker #0 started
[2025/07/13 09:37:16] [ info] [http_server] listen iface=0.0.0.0 tcp_port=8081
[2025/07/13 09:37:16] [ info] [sp] stream processor started
[2025/07/13 09:37:16] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
[2025/07/13 09:37:36] [ info] [filter:multiline:ml-detect] created new multiline stream for tail.0_kube.var.testoutput1.log
[0] kube.var.testoutput1.log: [[1752392245.389342000, {}], {"time"=>"2025-07-13T09:37:25.389342+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Sun Jul 13 07:37:25 UTC 2025 Likely to fail", "file"=>"./var/testoutput1.log"}]
[1] kube.var.testoutput1.log: [[1752392250.390684000, {}], {"time"=>"2025-07-13T09:37:30.390684+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Sun Jul 13 07:37:30 UTC 2025 Likely to fail", "file"=>"./var/testoutput1.log"}]
[0] kube.var.testoutput1.log: [[1752392255.391783000, {}], {"time"=>"2025-07-13T09:37:35.391783+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-13T07:37:35+00:00 should be ok", "file"=>"./var/testoutput1.log"}]
[0] kube.var.testoutput1.log: [[1752392255.391783000, {}], {"log"=>"Sun Jul 13 07:37:40 UTC 2025 Likely to fail"}]
[0] kube.var.testoutput1.log: [[1752392255.391783000, {}], {"log"=>"Sun Jul 13 07:37:45 UTC 2025 Likely to fail"}]
[0] kube.var.testoutput1.log: [[1752392270.394664000, {}], {"time"=>"2025-07-13T09:37:50.394664+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-13T07:37:50+00:00 should be ok", "file"=>"./var/testoutput1.log"}]
[0] kube.var.testoutput1.log: [[1752392270.394664000, {}], {"log"=>"Sun Jul 13 07:37:55 UTC 2025 Likely to fail"}]
[0] kube.var.testoutput1.log: [[1752392270.394664000, {}], {"log"=>"Sun Jul 13 07:38:00 UTC 2025 Likely to fail"}]
^C[2025/07/13 09:38:07] [engine] caught signal (SIGINT)
[2025/07/13 09:38:07] [ info] [input] pausing storage_backlog.1
[2025/07/13 09:38:07] [ info] [input] pausing emitter_for_ml-detect
[2025/07/13 09:38:07] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2025/07/13 09:38:07] [ info] [output:stdout:stdout.0] thread worker #0 stopped
go sample to generate kubernetes like output
package main
import (
"fmt"
"time"
)
const (
kubeTimeConst = "2006-01-02T15:04:05.999999999-07:00"
isoTimeConst = "2006-01-02T15:04:05-07:00"
recordDelay = time.Second * 5
)
func main() {
unixTimeRecord() // => meta data available
unixTimeRecord() // => meta data available
for {
isoTimeRecord() // => meta data available
unixTimeRecord() // => meta data missing
unixTimeRecord() // => meta data missing
}
}
// creates a test record with a kubernetes like line header
// followed by an iso date format that is matching the
// start-definition of the multi-line reader
func isoTimeRecord() {
t := time.Now()
fmt.Printf("%s %s should be ok\n", kubeLineHeader(t), t.UTC().Format(isoTimeConst))
time.Sleep(recordDelay)
}
// creates a test record with a kubernetes like line header
// followed by an unix date format that is matching the
// continuation-definition of the multi-line reader
func unixTimeRecord() {
t := time.Now()
fmt.Printf("%s %s Likely to fail\n", kubeLineHeader(t), t.UTC().Format(time.UnixDate))
time.Sleep(recordDelay)
}
// creates a kubernetes line header that is matched
// by the fluent-bit CRI parser.
func kubeLineHeader(t time.Time) string {
return fmt.Sprintf("%s stdout F", t.Format(kubeTimeConst))
}
Thanks @BernhardSchmid , I just want to compliment you on the effort made for the reproducer
@patrick-stephens Is there any progress with this issue, do you happen to know this?
@patrick-stephens Is there any progress with this issue, do you happen to know this?
Without an update here I would assume not unfortunately.
Hi - is there a chance to get an update on this one here? Regards
Would be great if someone could work on this issue to fix it.
Hi @patrick-stephens: Sorry for tagging you a second time. Do you see any chance to get at least an triage or probably already an outlook if/when there is a change to get this fixed?
I'm afraid I have little to no time for OSS at the moment so have not even looked at this. I will see if I can raise it for you but also good to bring it up at the community meeting too: https://www.meetup.com/fluent-community-meeting/
Is it still happening with 4.2?
@patrick-stephens Yes, it is still happening with the latest version.
I spent some time over the weekend looking into the issue. I believe I have fixed it with this PR https://github.com/fluent/fluent-bit/pull/11231. Would appreciate a review to confirm whether the approach makes sense and whether the fix is placed in the right place in the codebase.
Ah always good to contribute so much appreciated, I'll go have a look and see if I can push it along to the right folks.