fluent-bit out_azure_kusto: fix SIGSEGV in nested mk_list_foreach

Disclaimer: Used copilot to investigate the crash we faced and took it's help to write this

Summary

Fixes a SIGSEGV crash in the Azure Kusto output plugin when processing buffered backlog data on startup.

Root Cause

The ingest_all_chunks() function had nested mk_list_foreach_safe loops that both used the same tmp variable as the iterator:

mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {     // outer loop
    ...
    mk_list_foreach_safe(f_head, tmp, &fs_stream->files) {  // inner loop - SAME tmp!

The mk_list_foreach_safe macro stores the "next" pointer in its second argument for safe iteration during list modification. When the inner loop overwrote tmp, it corrupted the outer loop's iteration state, causing undefined behavior and eventually a SIGSEGV when the outer loop tried to continue iteration with a corrupted pointer.

Crash Stack Trace (from production)

[engine] caught signal (SIGSEGV)
#0 flush_init() at plugins/out_azure_kusto/azure_kusto.c:1169
#1 cb_azure_kusto_flush() at plugins/out_azure_kusto/azure_kusto.c:1257
#2 co_init() at lib/monkey/deps/flb_libco/amd64.c:117

Fix

Add a dedicated f_tmp variable for the inner loop to prevent iterator corruption:

struct mk_list *tmp;
struct mk_list *f_tmp;  // NEW: dedicated inner loop iterator
...
mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {
    ...
    mk_list_foreach_safe(f_head, f_tmp, &fs_stream->files) {  // Use f_tmp

Testing

Added flb_test_azure_kusto_buffering_backlog regression test that exercises the buffering/backlog restart code path
All existing azure_kusto tests continue to pass
Verified build compiles without warnings

Test json_invalid...                            [ OK ]
Test managed_identity_system...                 [ OK ]
Test managed_identity_user...                   [ OK ]
Test service_principal...                       [ OK ]
Test workload_identity...                       [ OK ]
Test buffering_backlog...                       [ OK ]
SUCCESS: All unit tests have passed.

Changes

plugins/out_azure_kusto/azure_kusto.c: Add f_tmp iterator variable for inner loop (+2 lines, -1 line)
tests/runtime/out_azure_kusto.c: Add buffering backlog regression test (+73 lines)

Summary by CodeRabbit

Refactor
- Improved internal iteration safety and stability; no public API changes.
Tests
- Added an end-to-end buffering/backlog regression test that verifies two-phase write-and-restart behavior with cleanup.
- Added runtime helpers and test scaffolding to support buffered-backlog testing and temporary resource cleanup.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Nov 26 '25 11:11 yaananth

Walkthrough

Updated the Azure Kusto output plugin to use a dedicated inner safety iterator variable; added a new runtime test that verifies buffering/backlog processing with a two-phase Fluent Bit run and accompanying recursive cleanup helpers.

Changes

Cohort / File(s)	Summary
Iterator variable fix `plugins/out_azure_kusto/azure_kusto.c`	Introduced a local `f_tmp` variable and switched the inner `mk_list_foreach_safe` to use `f_tmp` as the safety iterator while retaining `f_head` for element access. No public API changes.
Buffering backlog test & helpers `tests/runtime/out_azure_kusto.c`	Added portability guards and new system headers, implemented `flb_kusto_unlink_cb` and `flb_kusto_rm_rf` helpers for recursive removal, declared and implemented `flb_test_azure_kusto_buffering_backlog`, and registered the test in `TEST_LIST`. The test runs twice (create disk-backed backlog, restart to process backlog) and performs buffer-dir cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas to focus on:
- Correctness of mk_list_foreach_safe usage with the new f_tmp iterator.
- Test reliability: timing, deterministic waits, and teardown.
- Portability and correctness of recursive file removal helpers (flb_kusto_rm_rf).

Suggested labels

backport to v4.1.x

Suggested reviewers

edsiper
koleini

Poem

🐇 I hopped through loops and found a tidy spot,

f_tmp kept watch where list edges are fraught.
I tucked some carrots on disk with care,
Restarted the burrow — backlog cleared from there.
Happy hops for tests and tidy code, so spry!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main change: fixing a SIGSEGV crash caused by nested mk_list_foreach_safe loops in the Azure Kusto plugin.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03323cff3f37e41c811a623e83f40690778cfc7a and 5602e1f7124082f8c51790122d73bcaf7cc11737.

📒 Files selected for processing (1)

tests/runtime/out_azure_kusto.c (5 hunks)

🧰 Additional context used

🧠 Learnings (6)

📚 Learning: 2025-11-21T06:23:29.770Z

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

tests/runtime/out_azure_kusto.c

📚 Learning: 2025-08-31T12:46:11.940Z

Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

tests/runtime/out_azure_kusto.c

📚 Learning: 2025-09-08T11:21:33.975Z

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

tests/runtime/out_azure_kusto.c

📚 Learning: 2025-08-29T06:24:26.170Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

tests/runtime/out_azure_kusto.c

📚 Learning: 2025-08-29T06:24:55.855Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

tests/runtime/out_azure_kusto.c

📚 Learning: 2025-08-31T12:46:11.940Z

Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

tests/runtime/out_azure_kusto.c

🔇 Additional comments (6)

tests/runtime/out_azure_kusto.c (6)

3-5: LGTM!

Proper portability guard for POSIX.1-2008 features (nftw, stat, etc.) used by the test helpers.

27-31: LGTM!

All required headers for the test helpers and buffering_backlog test are present.

35-38: LGTM!

The nftw callback signature is correct and the implementation is appropriate for test cleanup.

40-49: LGTM!

The recursive removal logic is correct for test cleanup. The stat() check makes it idempotent, and the FTW_DEPTH | FTW_PHYS flags are appropriate for safe deletion.

59-59: LGTM!

Test prototype and registration follow the standard Fluent Bit test pattern.

Also applies to: 68-68

243-330: Well-structured regression test for the nested loop SIGSEGV fix.

The two-phase approach effectively exercises the backlog processing path (ingest_all_chunks) that previously crashed due to nested mk_list_foreach_safe with shared iterators. Key strengths:

PID-unique buffer path ensures test isolation

Cleanup before and after prevents stale data interference

First run buffers data to disk, second run processes backlog on startup

If the SIGSEGV occurs, the test will crash, validating the fix

Past review concerns about directory cleanup and mkdir error handling have been addressed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 26 '25 11:11 coderabbitai[bot]

@edsiper @cosmo0920 👋 any feedback on this PR? thank you!

Dec 19 '25 14:12 yaananth

out_azure_kusto: fix SIGSEGV in nested mk_list_foreach_safe loops

Summary

Root Cause

Crash Stack Trace (from production)

Fix

Testing

Changes

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches