fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Multiline filter issue, the timestamp and metadata of some logs is missing

Open mirko-lazarevic opened this issue 5 months ago • 11 comments

Bug Report

Describe the bug When using a multiline filter, the timestamp and metadata of some logs is systematically missing. We are logging two different types of messages in an time interval of 10 seconds like this:

Thu Jul 10 13:47:36 UTC 2025 Likely to fail
Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T13:47:36+00:00 should be ok
Thu Jul 10 13:47:46 UTC 2025 Likely to fail
Thu Jul 10 13:47:56 UTC 2025 Likely to fail
Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T13:48:16+00:00 should be ok
Thu Jul 10 13:48:26 UTC 2025 Likely to fail
Thu Jul 10 13:48:36 UTC 2025 Likely to fail
Thu Jul 10 13:48:46 UTC 2025 Likely to fail

After logging the "Likely to fail" message twice, the above block is repeatedly logged in a loop (1x should be ok, followed by 3x Likley to fail).

The corresponding logs copied from the kubernetes file:

2025-07-10T08:47:36.927938223-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.929896848-05:00 stdout F Thu Jul 10 13:47:36 UTC 2025 Likely to fail
2025-07-10T08:47:36.931218787-05:00 stdout F 2025-07-10T13:47:36+00:00 should be ok
2025-07-10T08:47:46.936722867-05:00 stdout F Thu Jul 10 13:47:46 UTC 2025 Likely to fail
2025-07-10T08:47:56.942020106-05:00 stdout F Thu Jul 10 13:47:56 UTC 2025 Likely to fail
2025-07-10T08:48:06.946530058-05:00 stdout F Thu Jul 10 13:48:06 UTC 2025 Likely to fail
2025-07-10T08:48:16.951177698-05:00 stdout F 2025-07-10T13:48:16+00:00 should be ok
2025-07-10T08:48:26.955666785-05:00 stdout F Thu Jul 10 13:48:26 UTC 2025 Likely to fail
2025-07-10T08:48:36.960419370-05:00 stdout F Thu Jul 10 13:48:36 UTC 2025 Likely to fail
2025-07-10T08:48:46.965574420-05:00 stdout F Thu Jul 10 13:48:46 UTC 2025 Likely to fail
2025-07-10T08:48:56.970499744-05:00 stdout F 2025-07-10T13:48:56+00:00 should be ok

We have a multiline filter (alias "ml-detect" in below configuration) which matches the first time format ("2025-07-08T15:25:07+00:0"0, message: "should be ok") in the start state and the second time format ("Tue Jul 8 15:25:17 UTC 2025", message "Likely to fail") in the next state.

After the first occurrence of the "should be ok" message, all logs with the message "Likely to fail" have the identical timestamp (e.g. [[1752051270.328705182, {}]) of that previous "should be ok" message, even though they are logged 10 seconds afterwards. In addition, the logs with the message "Likely to fail" do not have any meta data fields like "time", "stream", or "file".

The below logs show an example output, including the wrong timestamp and missing metadata. It shows that the first two "Likely to fail" logs have a correct timestamp and metadata. As soon as the first "should be ok" message is logged, the bug appears (our multiline parser matches the start state) and the following "Likely to fail" logs have the wrong timestamp and no metadata.

[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.927938223, {}], {"time"=>"2025-07-10T08:47:36.927938223-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[1] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.929896848, {}], {"time"=>"2025-07-10T08:47:36.929896848-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Thu Jul 10 13:47:36 UTC 2025 Likely to fail", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"time"=>"2025-07-10T08:47:36.931218787-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:47:36+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:47:56 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155256.931218787, {}], {"log"=>"Thu Jul 10 13:48:06 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"time"=>"2025-07-10T08:48:16.951177698-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:16+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:26 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:36 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155296.951177698, {}], {"log"=>"Thu Jul 10 13:48:46 UTC 2025 Likely to fail"}]
[0] kube.var.log.containers.testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log: [[1752155336.970499744, {}], {"time"=>"2025-07-10T08:48:56.970499744-05:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-10T13:48:56+00:00 should be ok", "file"=>"/var/log/containers/testlog_default_testlog-457aba5a5cfffad53d2e728f92db4bfb7c2fed5c68e18b3f0edf046b883644e5.log"}]

We are using the multiline "cri" parser in the tail input to remove the file headings added by kubernetes. In this way we can use our time formats for the multiline parser.

To Reproduce

  • Example log message if applicable: see above
  • Steps to reproduce the problem:
    1. Deploy fluent-bit using below configuration and make sure it has write access to "/var/log/fluent-bit/" for setting up the storage
    2. Create a testlog pod using the command: oc run testlog --image <any-linux-image> --command -- /bin/bash -c 'echo $(date) Likely to fail; echo $(date) Likely to fail; while true; do echo $(date -Iseconds) should be ok;sleep 10;echo $(date) Likely to fail;sleep 10;echo $(date) Likely to fail;sleep 10; echo $(date) Likely to fail;sleep 10;done'

It is important that the messages are not collected at once and logged with a delay (in our test example 10 seconds). When using the read from head function in the tail input, the bug will not appear. It needs to wait for the next log line to appear instead of collecting all at once.

Expected behavior

We expect all logs to have the current timestamp and metadata.

Your Environment

  • Version used: fluent-bit v3.2.10
  • Configuration:
  fluent-bit.yaml: |
    service:
      flush: 2
      log_level: info
      http_server: on
      http_listen: 0.0.0.0
      http_port: 8081
      health_check: on
      hc_errors_count: 1
      hc_retry_failure_count: 1
      hc_period: 30
      storage.path: /var/log/fluent-bit/fluent-bit-cache2
      storage.max_chunks_up: 512
      storage.metrics: on
      storage.sync: full
      storage.backlog.mem_limit: 20M
      parsers_file: parsers.yaml
    pipeline:
      inputs:
        - name: tail
          tag: kube.*
          path: /var/log/containers/*testlog*.log
          path_key: file
          db: /var/log/fluent-bit/fluent-bit2.DB
          buffer_chunk_size: 512kb
          buffer_max_size: 2mb
          mem_buf_limit: 512mb
          multiline.parser: cri
          skip_long_lines: on
          refresh_interval: 20
          skip_empty_lines: on
          threaded: true
          read_from_head: false
          rotate_wait: 15
          storage.type: filesystem
          storage.pause_on_chunks_overlimit: off
      filters:
        - name: multiline
          alias: ml-detect
          match: 'kube.var.log.containers.*'
          multiline.parser: multiline_detector
          multiline.key_content: log
          flush_ms: 1000
          emitter_mem_buf_limit: 50M
      outputs:
        - name: stdout
          match: '*'
  parsers.yaml: |
    multiline_parsers:
      - name: multiline_detector
        type: regex
        flush_timeout: 1000
        rules:
          - state: start_state
            regex: '/^(?<log>(((\[?\d+(\-|\/)\d+(\-|\/)\d+.?\d+\:\d+\:\d+(?:[.,]\d{1,6})?(Z|\+\d+)?)\]?).*))/'
            next_state: notime_bib_cont
          - state: notime_bib_cont
            regex: '/^(?<log>(?!\[?\d+(\-|\/)\d+(\-|\/)\d+)(?!BIP\d{3,4}[EI]|\s*Caused\sby\s\[2\]|\{\s*\").*)/'
            next_state: notime_bib_cont
          - state: notime_bib_cont
            regex: '/^\s*(?!.+)'
            next_state: notime_bib_cont
  • Environment name and version:
    • OpenShift Version: 4.17.14
    • Kubernetes Version: v1.27.16+03a907c
  • Operating System and version: RHEL 8
  • Filters and plugins: included in configuration above

Additional context

We are trying to merge multiline logs into one log line using a multiline parser. In this particular example the usage of our multiline parser is not visible, but is shows a bug we detected.

mirko-lazarevic avatar Jul 10 '25 15:07 mirko-lazarevic

I recall some recent changes around timing so wondering if it is reproducible on the 4.x series?

patrick-stephens avatar Jul 10 '25 17:07 patrick-stephens

I was able to reproduce this with fluent-bit version 4.0.4

I used the attached go program to generate kube-like log lines, adjusted the configuration above for filter multiline so that the output file get considered by the Match statement.

Content of generated log file:

cat var/testoutput1.log

2025-07-13T09:37:25.389342+02:00 stdout F Sun Jul 13 07:37:25 UTC 2025 Likely to fail
2025-07-13T09:37:30.390684+02:00 stdout F Sun Jul 13 07:37:30 UTC 2025 Likely to fail
2025-07-13T09:37:35.391783+02:00 stdout F 2025-07-13T07:37:35+00:00 should be ok
2025-07-13T09:37:40.393228+02:00 stdout F Sun Jul 13 07:37:40 UTC 2025 Likely to fail
2025-07-13T09:37:45.393897+02:00 stdout F Sun Jul 13 07:37:45 UTC 2025 Likely to fail
2025-07-13T09:37:50.394664+02:00 stdout F 2025-07-13T07:37:50+00:00 should be ok
2025-07-13T09:37:55.395977+02:00 stdout F Sun Jul 13 07:37:55 UTC 2025 Likely to fail
2025-07-13T09:38:00.396967+02:00 stdout F Sun Jul 13 07:38:00 UTC 2025 Likely to fail

Fluent-bit console log:

./bin/fluent-bit -c ./etc/fluent-bit.yaml

Fluent Bit v4.0.4
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____ 
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/ 


[2025/07/13 09:37:16] [ info] [fluent bit] version=4.0.4, commit=59c4d4c089, pid=82387
[2025/07/13 09:37:16] [ info] [storage] ver=1.5.3, type=memory+filesystem, sync=full, checksum=off, max_chunks_up=512
[2025/07/13 09:37:16] [ info] [storage] backlog input plugin: storage_backlog.1
[2025/07/13 09:37:16] [ info] [simd    ] disabled
[2025/07/13 09:37:16] [ info] [cmetrics] version=1.0.4
[2025/07/13 09:37:16] [ info] [ctraces ] version=0.6.6
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] initializing
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] storage_strategy='filesystem' (memory + filesystem)
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] multiline core started
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] db: delete unmonitored stale inodes from the database: count=1
[2025/07/13 09:37:16] [ info] [input:tail:tail.0] thread instance initialized
[2025/07/13 09:37:16] [ info] [input:storage_backlog:storage_backlog.1] initializing
[2025/07/13 09:37:16] [ info] [input:storage_backlog:storage_backlog.1] storage_strategy='memory' (memory only)
[2025/07/13 09:37:16] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 19.1M
[2025/07/13 09:37:16] [ info] [filter:multiline:ml-detect] created emitter: emitter_for_ml-detect
[2025/07/13 09:37:16] [ info] [input:emitter:emitter_for_ml-detect] initializing
[2025/07/13 09:37:16] [ info] [input:emitter:emitter_for_ml-detect] storage_strategy='memory' (memory only)
[2025/07/13 09:37:16] [ info] [output:stdout:stdout.0] worker #0 started
[2025/07/13 09:37:16] [ info] [http_server] listen iface=0.0.0.0 tcp_port=8081
[2025/07/13 09:37:16] [ info] [sp] stream processor started
[2025/07/13 09:37:16] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
[2025/07/13 09:37:36] [ info] [filter:multiline:ml-detect] created new multiline stream for tail.0_kube.var.testoutput1.log
[0] kube.var.testoutput1.log: [[1752392245.389342000, {}], {"time"=>"2025-07-13T09:37:25.389342+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Sun Jul 13 07:37:25 UTC 2025 Likely to fail", "file"=>"./var/testoutput1.log"}]
[1] kube.var.testoutput1.log: [[1752392250.390684000, {}], {"time"=>"2025-07-13T09:37:30.390684+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Sun Jul 13 07:37:30 UTC 2025 Likely to fail", "file"=>"./var/testoutput1.log"}]
[0] kube.var.testoutput1.log: [[1752392255.391783000, {}], {"time"=>"2025-07-13T09:37:35.391783+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-13T07:37:35+00:00 should be ok", "file"=>"./var/testoutput1.log"}]
[0] kube.var.testoutput1.log: [[1752392255.391783000, {}], {"log"=>"Sun Jul 13 07:37:40 UTC 2025 Likely to fail"}]
[0] kube.var.testoutput1.log: [[1752392255.391783000, {}], {"log"=>"Sun Jul 13 07:37:45 UTC 2025 Likely to fail"}]
[0] kube.var.testoutput1.log: [[1752392270.394664000, {}], {"time"=>"2025-07-13T09:37:50.394664+02:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-07-13T07:37:50+00:00 should be ok", "file"=>"./var/testoutput1.log"}]
[0] kube.var.testoutput1.log: [[1752392270.394664000, {}], {"log"=>"Sun Jul 13 07:37:55 UTC 2025 Likely to fail"}]
[0] kube.var.testoutput1.log: [[1752392270.394664000, {}], {"log"=>"Sun Jul 13 07:38:00 UTC 2025 Likely to fail"}]
^C[2025/07/13 09:38:07] [engine] caught signal (SIGINT)
[2025/07/13 09:38:07] [ info] [input] pausing storage_backlog.1
[2025/07/13 09:38:07] [ info] [input] pausing emitter_for_ml-detect
[2025/07/13 09:38:07] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2025/07/13 09:38:07] [ info] [output:stdout:stdout.0] thread worker #0 stopped

go sample to generate kubernetes like output

package main

import (
	"fmt"
	"time"
)

const (
	kubeTimeConst = "2006-01-02T15:04:05.999999999-07:00"
	isoTimeConst  = "2006-01-02T15:04:05-07:00"
	recordDelay   = time.Second * 5
)

func main() {
	unixTimeRecord() // => meta data available
	unixTimeRecord() // => meta data available
	for {
		isoTimeRecord()  // => meta data available
		unixTimeRecord() // => meta data missing
		unixTimeRecord() // => meta data missing
	}
}

// creates a test record with a kubernetes like line header
// followed by an iso date format that is matching the
// start-definition of the multi-line reader
func isoTimeRecord() {
	t := time.Now()
	fmt.Printf("%s %s should be ok\n", kubeLineHeader(t), t.UTC().Format(isoTimeConst))
	time.Sleep(recordDelay)
}

// creates a test record with a kubernetes like line header
// followed by an unix date format that is matching the
// continuation-definition of the multi-line reader
func unixTimeRecord() {
	t := time.Now()
	fmt.Printf("%s %s Likely to fail\n", kubeLineHeader(t), t.UTC().Format(time.UnixDate))
	time.Sleep(recordDelay)
}

// creates a kubernetes line header that is matched
// by the fluent-bit CRI parser.
func kubeLineHeader(t time.Time) string {
	return fmt.Sprintf("%s stdout F", t.Format(kubeTimeConst))
}

BernhardSchmid avatar Jul 13 '25 07:07 BernhardSchmid

Thanks @BernhardSchmid , I just want to compliment you on the effort made for the reproducer

patrick-stephens avatar Jul 13 '25 09:07 patrick-stephens

@patrick-stephens Is there any progress with this issue, do you happen to know this?

BernhardSchmid avatar Sep 24 '25 18:09 BernhardSchmid

@patrick-stephens Is there any progress with this issue, do you happen to know this?

Without an update here I would assume not unfortunately.

patrick-stephens avatar Sep 25 '25 09:09 patrick-stephens

Hi - is there a chance to get an update on this one here? Regards

BernhardSchmid avatar Oct 29 '25 12:10 BernhardSchmid

Would be great if someone could work on this issue to fix it.

IngoHolder avatar Nov 17 '25 09:11 IngoHolder

Hi @patrick-stephens: Sorry for tagging you a second time. Do you see any chance to get at least an triage or probably already an outlook if/when there is a change to get this fixed?

BernhardSchmid avatar Nov 28 '25 14:11 BernhardSchmid

I'm afraid I have little to no time for OSS at the moment so have not even looked at this. I will see if I can raise it for you but also good to bring it up at the community meeting too: https://www.meetup.com/fluent-community-meeting/

Is it still happening with 4.2?

patrick-stephens avatar Nov 28 '25 15:11 patrick-stephens

@patrick-stephens Yes, it is still happening with the latest version.

I spent some time over the weekend looking into the issue. I believe I have fixed it with this PR https://github.com/fluent/fluent-bit/pull/11231. Would appreciate a review to confirm whether the approach makes sense and whether the fix is placed in the right place in the codebase.

mirko-lazarevic avatar Dec 01 '25 11:12 mirko-lazarevic

Ah always good to contribute so much appreciated, I'll go have a look and see if I can push it along to the right folks.

patrick-stephens avatar Dec 01 '25 11:12 patrick-stephens