fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

filter_rewrite_tag: support recursion_action(#6074)

Open nokute78 opened this issue 2 years ago • 12 comments

In some cases, filter_rewrite_tag causes infinite loop and hang up. This patch is to support recursion_action to check if recursion will be occurred before emitting.

We can't check it at initialization since filter_rewrite_tag needs an incoming record for routing. So filter_rewrite_tag check at run-time.

'recursion_action' supports following values.

Value Description
none no action but fast.
drop Drop records silently.
drop_and_log Drop records and log. It is useful to analyze a root cause.
exit Default. Abort programs. It is useful for testing.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [X] Example configuration file for the change
  • [X] Debug log output from testing the change
  • [X] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Attached local packaging test output showing all targets (including any new ones) build.

Documentation

  • [ ] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Configuration

[INPUT]
    Name dummy
    Tag docker.logs.apache
    Dummy {"log":"body", "attrs":{"application_type":"apache"}}    

[FILTER]
    Name         rewrite_tag
    match_regex  docker.logs.*
    rule         $attrs['application_type'] ^(apache)$ docker.logs.$1 false
    emitter_name re-emitted
    recursion_action drop_and_log

[OUTPUT]
    name  stdout
    match *

Debug output / Valgrind output

$ valgrind --leak-check=full ../../bin/fluent-bit -c a.conf 
==15833== Memcheck, a memory error detector
==15833== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15833== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==15833== Command: ../../bin/fluent-bit -c a.conf
==15833== 
Fluent Bit v2.0.0
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/09/23 08:45:21] [ info] [fluent bit] version=2.0.0, commit=2ee2d42130, pid=15833
[2022/09/23 08:45:21] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/09/23 08:45:21] [ info] [cmetrics] version=0.4.0
[2022/09/23 08:45:21] [ info] [sp] stream processor started
[2022/09/23 08:45:21] [ info] [output:stdout:stdout.0] worker #0 started
[2022/09/23 08:45:22] [ warn] [filter:rewrite_tag:rewrite_tag.0] recursion occured. tag=docker.logs.apache
[2022/09/23 08:45:22] [ warn] [filter:rewrite_tag:rewrite_tag.0] recursion occured. tag=docker.logs.apache
^C[2022/09/23 08:45:23] [engine] caught signal (SIGINT)
[2022/09/23 08:45:23] [ warn] [engine] service will shutdown in max 5 seconds
[2022/09/23 08:45:23] [ info] [input] pausing dummy.0
[2022/09/23 08:45:23] [ info] [engine] service has stopped (0 pending tasks)
[2022/09/23 08:45:23] [ info] [input] pausing dummy.0
[2022/09/23 08:45:23] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/09/23 08:45:24] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==15833== 
==15833== HEAP SUMMARY:
==15833==     in use at exit: 0 bytes in 0 blocks
==15833==   total heap usage: 1,406 allocs, 1,406 frees, 917,922 bytes allocated
==15833== 
==15833== All heap blocks were freed -- no leaks are possible
==15833== 
==15833== For lists of detected and suppressed errors, rerun with: -s
==15833== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

nokute78 avatar Sep 22 '22 23:09 nokute78

I've been thinking about this. Is there any reason that the default behavior shouldn't be either drop_and_log or exit?

It feels like either of these actions is a better option then the default which is a segfault or endless loop. I suspect that the default case is the most efficient and under normal operation is probably best however, it feels like this is a pretty big gotcha just hanging out there. If efficiency is that critical to your pipeline it feels like you have the operation to disable the check but by default it should be on.

derekmceachern avatar Oct 01 '22 18:10 derekmceachern

@derekmceachern Thank you for comment. I updated that 'exit' was a default value.

nokute78 avatar Oct 02 '22 10:10 nokute78

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Jan 01 '23 02:01 github-actions[bot]

Ping.

nokute78 avatar Jan 26 '23 12:01 nokute78

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Apr 28 '23 01:04 github-actions[bot]

ping

nokute78 avatar Apr 28 '23 22:04 nokute78

I fixed conflict.

nokute78 avatar Jun 30 '23 23:06 nokute78

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Sep 29 '23 01:09 github-actions[bot]

ping

nokute78 avatar Oct 14 '23 23:10 nokute78

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Jan 13 '24 01:01 github-actions[bot]

ping

nokute78 avatar Feb 02 '24 23:02 nokute78

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar May 06 '24 01:05 github-actions[bot]