fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

out_azure_kusto: fix SIGSEGV in nested mk_list_foreach_safe loops

Open yaananth opened this issue 1 month ago β€’ 1 comments

Disclaimer: Used copilot to investigate the crash we faced and took it's help to write this

Summary

Fixes a SIGSEGV crash in the Azure Kusto output plugin when processing buffered backlog data on startup.

Root Cause

The ingest_all_chunks() function had nested mk_list_foreach_safe loops that both used the same tmp variable as the iterator:

mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {     // outer loop
    ...
    mk_list_foreach_safe(f_head, tmp, &fs_stream->files) {  // inner loop - SAME tmp!

The mk_list_foreach_safe macro stores the "next" pointer in its second argument for safe iteration during list modification. When the inner loop overwrote tmp, it corrupted the outer loop's iteration state, causing undefined behavior and eventually a SIGSEGV when the outer loop tried to continue iteration with a corrupted pointer.

Crash Stack Trace (from production)

[engine] caught signal (SIGSEGV)
#0 flush_init() at plugins/out_azure_kusto/azure_kusto.c:1169
#1 cb_azure_kusto_flush() at plugins/out_azure_kusto/azure_kusto.c:1257
#2 co_init() at lib/monkey/deps/flb_libco/amd64.c:117

Fix

Add a dedicated f_tmp variable for the inner loop to prevent iterator corruption:

struct mk_list *tmp;
struct mk_list *f_tmp;  // NEW: dedicated inner loop iterator
...
mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {
    ...
    mk_list_foreach_safe(f_head, f_tmp, &fs_stream->files) {  // Use f_tmp

Testing

  • Added flb_test_azure_kusto_buffering_backlog regression test that exercises the buffering/backlog restart code path
  • All existing azure_kusto tests continue to pass
  • Verified build compiles without warnings
Test json_invalid...                            [ OK ]
Test managed_identity_system...                 [ OK ]
Test managed_identity_user...                   [ OK ]
Test service_principal...                       [ OK ]
Test workload_identity...                       [ OK ]
Test buffering_backlog...                       [ OK ]
SUCCESS: All unit tests have passed.

Changes

  • plugins/out_azure_kusto/azure_kusto.c: Add f_tmp iterator variable for inner loop (+2 lines, -1 line)
  • tests/runtime/out_azure_kusto.c: Add buffering backlog regression test (+73 lines)

Summary by CodeRabbit

  • Refactor

    • Improved internal iteration safety and stability; no public API changes.
  • Tests

    • Added an end-to-end buffering/backlog regression test that verifies two-phase write-and-restart behavior with cleanup.
    • Added runtime helpers and test scaffolding to support buffered-backlog testing and temporary resource cleanup.

✏️ Tip: You can customize this high-level summary in your review settings.

yaananth avatar Nov 26 '25 11:11 yaananth

Walkthrough

Updated the Azure Kusto output plugin to use a dedicated inner safety iterator variable; added a new runtime test that verifies buffering/backlog processing with a two-phase Fluent Bit run and accompanying recursive cleanup helpers.

Changes

Cohort / File(s) Summary
Iterator variable fix
plugins/out_azure_kusto/azure_kusto.c
Introduced a local f_tmp variable and switched the inner mk_list_foreach_safe to use f_tmp as the safety iterator while retaining f_head for element access. No public API changes.
Buffering backlog test & helpers
tests/runtime/out_azure_kusto.c
Added portability guards and new system headers, implemented flb_kusto_unlink_cb and flb_kusto_rm_rf helpers for recursive removal, declared and implemented flb_test_azure_kusto_buffering_backlog, and registered the test in TEST_LIST. The test runs twice (create disk-backed backlog, restart to process backlog) and performs buffer-dir cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to focus on:
    • Correctness of mk_list_foreach_safe usage with the new f_tmp iterator.
    • Test reliability: timing, deterministic waits, and teardown.
    • Portability and correctness of recursive file removal helpers (flb_kusto_rm_rf).

Suggested labels

backport to v4.1.x

Suggested reviewers

  • edsiper
  • koleini

Poem

πŸ‡ I hopped through loops and found a tidy spot,

f_tmp kept watch where list edges are fraught.
I tucked some carrots on disk with care,
Restarted the burrow β€” backlog cleared from there.
Happy hops for tests and tidy code, so spry!

Pre-merge checks and finishing touches

βœ… Passed checks (3 passed)
Check name Status Explanation
Description Check βœ… Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check βœ… Passed The title clearly and accurately summarizes the main change: fixing a SIGSEGV crash caused by nested mk_list_foreach_safe loops in the Azure Kusto plugin.
Docstring Coverage βœ… Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] πŸ“ Generate docstrings
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

πŸ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 03323cff3f37e41c811a623e83f40690778cfc7a and 5602e1f7124082f8c51790122d73bcaf7cc11737.

πŸ“’ Files selected for processing (1)
  • tests/runtime/out_azure_kusto.c (5 hunks)
🧰 Additional context used
🧠 Learnings (6)
πŸ“š Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/out_azure_kusto.c
πŸ“š Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/out_azure_kusto.c
πŸ“š Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/runtime/out_azure_kusto.c
πŸ“š Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/out_azure_kusto.c
πŸ“š Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/out_azure_kusto.c
πŸ“š Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • tests/runtime/out_azure_kusto.c
πŸ”‡ Additional comments (6)
tests/runtime/out_azure_kusto.c (6)

3-5: LGTM!

Proper portability guard for POSIX.1-2008 features (nftw, stat, etc.) used by the test helpers.


27-31: LGTM!

All required headers for the test helpers and buffering_backlog test are present.


35-38: LGTM!

The nftw callback signature is correct and the implementation is appropriate for test cleanup.


40-49: LGTM!

The recursive removal logic is correct for test cleanup. The stat() check makes it idempotent, and the FTW_DEPTH | FTW_PHYS flags are appropriate for safe deletion.


59-59: LGTM!

Test prototype and registration follow the standard Fluent Bit test pattern.

Also applies to: 68-68


243-330: Well-structured regression test for the nested loop SIGSEGV fix.

The two-phase approach effectively exercises the backlog processing path (ingest_all_chunks) that previously crashed due to nested mk_list_foreach_safe with shared iterators. Key strengths:

  • PID-unique buffer path ensures test isolation
  • Cleanup before and after prevents stale data interference
  • First run buffers data to disk, second run processes backlog on startup
  • If the SIGSEGV occurs, the test will crash, validating the fix
  • Past review concerns about directory cleanup and mkdir error handling have been addressed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 26 '25 11:11 coderabbitai[bot]

@edsiper @cosmo0920 πŸ‘‹ any feedback on this PR? thank you!

yaananth avatar Dec 19 '25 14:12 yaananth