fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

out_es: fix trace_error truncating the response

Open PettitWesley opened this issue 3 years ago • 5 comments

Fixes #5314 5314

Signed-off-by: Wesley Pettit [email protected]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [ ] Example configuration file for the change
  • [ ] Debug log output from testing the change
  • [ ] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [ ] Attached local packaging test output showing all targets (including any new ones) build.

Documentation

  • [ ] Documentation required for this feature

Backporting

  • [ ] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

PettitWesley avatar Jul 21 '22 01:07 PettitWesley

@nokute78 updated

PettitWesley avatar Jul 25 '22 21:07 PettitWesley

@nokute78 I agree that fixing the flb_log functions is more ideal, but I am not sure of the best way to do it. The size limitation seems very built into how the code works: https://github.com/fluent/fluent-bit/blob/v1.9.6/src/flb_log.c#L47

It seems like this code requires a max log message size... it can't be infinite.. fundamentally we must have some buffer size. May be it should be user configurable?

@edsiper

PettitWesley avatar Jul 28 '22 00:07 PettitWesley

there is a limitation because we use a pipe. we would need to move to a ring-buffer strategy for v2.0 (we already added a ring buffer api)

edsiper avatar Jul 28 '22 22:07 edsiper

@edsiper What should we do for now? We need to fix this bug before 2.0. Should we stick with my fix to print to stderr? But then as @nokute78 mentioned this isn't ideal since Fluent Bit supports Log_File in Service section and we will break that since stderr won't be directed to the log file.

PettitWesley avatar Jul 28 '22 23:07 PettitWesley

@PettitWesley Hmm, a workaround is

  • flb_log
    • Add new flb_log_ API that returns a truncated limit size. It is a size of msg of log_message https://github.com/fluent/fluent-bit/blob/v1.9.6/src/flb_log.c#L47.
  • out_es
    • Check if resp_size is greater than the limit size using new flb_log API.
    • If it is true, invoke flb_plg_error and passed log size should be the limit size.
    • Move resp.payload pointer , decrement resp_size and iterates until resp_size is 0.

nokute78 avatar Jul 31 '22 02:07 nokute78

@nokute78 @edsiper Sorry, but I'm no longer feeling strongly about allocating my effort to build a new flb_log style function that can return the amount of bytes read so we can iteratively write the buffer.

Why?

  1. This actually won't necessarily be better for the user. See each log msg is not guaranteed to be written sequentially to stdout/wherever AFAICT. If you have multiple threads running and pushing logs, it can get intermixed. So then the payload may become unreadable.
  2. This is a debugging option, and I think as a short term fix it does not need to have the most perfect experience. In my interactions with customers, I rarely ever encounter someone using the log file option. (Which is the reason for this more complicated proposal, if we only cared about stdout/stderr, then we could just use printf).

I would like to contribute the fix that I have here as a short term improvement that gets us closer to the ideal.

                    if (c->resp.payload_size < 4000) {
                        flb_plg_error(ctx->ins, "error: Output\n%s",
                                      c->resp.payload);
                    } else {
                        /*
                        * We must use fwrite since the flb_log functions
                        * will truncate data at 4KB
                        */
                        fwrite(c->resp.payload, 1, c->resp.payload_size, stderr);
                        fflush(stderr);
                    }

Why truncate at 4000 not 4096?

First of all, the log message buffer is slightly smaller than 4096: https://github.com/fluent/fluent-bit/blob/master/src/flb_log.c#L47

And then it always has to add the header with timestamp and plugin name (including hidden chars for formatting), so the idea is that we don't know how big that bit is, and as a short term fix, 4000 is a safe number, we can always print at least 4000 bytes.

In the future, we could improve this further, but this short term fix will unblock my customer who reported it.

PettitWesley avatar Aug 16 '22 16:08 PettitWesley

sounds good to me to merge this now.

@nokute78 I like the idea of https://github.com/fluent/fluent-bit/pull/5761#issuecomment-1200331827 where the logger api can truncate the message size.. can you implement that ?

edsiper avatar Aug 16 '22 18:08 edsiper

@edsiper I created a patch https://github.com/fluent/fluent-bit/pull/5952 . Could you check it ?

nokute78 avatar Aug 28 '22 11:08 nokute78