fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

in_forward: SIGSEGV error when ACK is enabled due to null connection in send_ack

Open mirko-lazarevic opened this issue 4 months ago • 1 comments

Bug Report

Describe the bug I'm experiencing a segmentation fault (SIGSEGV) when ACK is enabled in Fluent Bit. It appears that when ingestion is paused ([ warn] [input] forward.0 paused (mem buf overlimit) ) all active connections are being closed before send_ack is executed, leading to a null pointer dereference.

In the following code snippet from fw_prot.c

/* Handle ACK response */
if (chunk_id != -1) {
    chunk = root.via.array.ptr[2].via.map.ptr[chunk_id].val;
    send_ack(ctx->ins, conn, chunk);
}

The conn->connection is null when send_ack is called, causing the crash.

To Reproduce

  • Example log message if applicable:
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:07] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== sending ack message back ====
[2024/09/27 16:40:08] [ warn] [input] forward.0 paused (mem buf overlimit)
[2024/09/27 16:40:08] [ info] [input] pausing forward.0
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== deleting all connections ====
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== all connections have been deletes ====
[2024/09/27 16:40:08] [ info] [input:forward:forward.0] ==== sending ack message back ====
Process 30395 stopped
* thread #17, name = 'flb-pipeline', stop reason = EXC_BAD_ACCESS (code=1, address=0x190)
    frame #0: 0x00000001000971e0 fluent-bit`flb_connection_get_flags(connection=0x0000000000000000) at flb_connection.c:199:45
   196
   197  int flb_connection_get_flags(struct flb_connection *connection)
   198  {
-> 199      return flb_stream_get_flags(connection->stream);
   200  }
   201
   202  void flb_connection_reset_connection_timeout(struct flb_connection *connection)
Target 0: (fluent-bit) stopped.
  • Steps to reproduce the problem:
  1. Enable ACK by setting up chunk_id before sending data to input forward plugin
  2. Run fluent-bit in an env where it can exceed the memory buffer limit
  3. Wait until the memory buffer overlimit condition triggers

Expected behavior Fluent Bit should handle the ACK response gracefully, even when active connections are closed due to memory buffer overlimit or other conditions.

Screenshots

Your Environment

  • Version used: latest
  • Configuration:
[SERVICE]
    Log_Level       info
    HTTP_Server     On
    HTTP_Listen     0.0.0.0
    HTTP_Port       8085
    Parsers_File    parsers.conf

[INPUT]
    Name                forward
    Listen              127.0.0.1
    Port                24224
    Buffer_Chunk_Size   512KB
    Buffer_Max_Size     2MB
    Mem_Buf_Limit       10MB

[OUTPUT]
    Name                null
    Match              *

  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version:
  • Operating System and version: macOs
  • Filters and plugins:

Additional context

mirko-lazarevic avatar Sep 30 '24 08:09 mirko-lazarevic