flush_timeout of multiline parser does not reset the state machine
Bug Report
To Reproduce
We are running a configuration where we are looking for a trigger (bar) in the log message and keep appending messages until we see the trigger again.
fluent-bit.conf
[SERVICE]
flush 1
log_level error
parsers_file parsers.conf
[INPUT]
Name tail
Path test.log
Read_from_Head True
multiline.parser bar
[OUTPUT]
Name stdout
Match *
parsers.conf
[MULTILINE_PARSER]
name bar
type regex
flush_timeout 1000
# rules | state name | regex pattern | next state
# ------|---------------|-------------------|-----------
rule "start_state" "/.*?bar.*/" "cont"
rule "cont" "/^(?!.*bar).*$/" "cont"
test.log
foo 0
bar 1
foo 2
foo 3
Run
$ fluent-bit -c fluent-bit.conf &; sleep 3 && \
echo -e "foo 4\nfoo 5" >> test.log && sleep 3 && \
echo -e "foo 6\nbar 7" >> test.log && sleep 3 && \
echo -e "foo 8\nfoo 9" >> test.log && sleep 3 && \
kill $!
[...]
[0] tail.0: [[1721338326.383185852, {}], {"log"=>"foo 0
"}]
[0] tail.0: [[1721338326.383223193, {}], {"log"=>"bar 1
foo 2
foo 3
"}]
[0] tail.0: [[1721338326.383223193, {}], {"log"=>"foo 4
foo 5
"}]
[0] tail.0: [[1721338326.383223193, {}], {"log"=>"foo 6
"}]
[1] tail.0: [[1721338332.379612465, {}], {"log"=>"bar 7
"}]
[0] tail.0: [[1721338332.379612465, {}], {"log"=>"foo 8
foo 9
"}]
[...]
Grouping foo 2 and foo 3 with bar 1 is correct. But after the flush timeout triggered we are still stuck in the state machine. So when we append foo 4 and foo 5 to the file, they are merged together although parser never saw the bar trigger again. foo 6 is the correctly put as single record, because afterwards we have the bar 7 trigger. But afterwards foo 8 and foo 9 are again erroneously merged together.
Expected behavior
foo 4 and foo 5 as well as foo 8 and foo 9 should be single lines and not be merged.
Your Environment
- Version used: 3.0.7 / 3.1.3
- Configuration: see above
- Environment name and version (e.g. Kubernetes? What version?): k8s and local
- Server type and version: N/A
- Operating System and version: Arch / Ubuntu
- Filters and plugins: see above
Additional context
As the whole point of the multiline parser is to merge multiple records, I don't think it makes any sense to not reset the state machine as soon as something is emitted by the flush_timeout.
I'm certainly no expert on the code base, but IMO the reset needs to happen in
https://github.com/fluent/fluent-bit/blob/574a69af744535b6e016965f02eef9f739a5df1e/src/multiline/flb_ml.c#L1356-L1359
and according to this comment
https://github.com/fluent/fluent-bit/blob/574a69af744535b6e016965f02eef9f739a5df1e/src/multiline/flb_ml.c#L1469-L1474
probably has to be guarded with if (forced_flush) {...}.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
Still relevant.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.