out_azure_kusto: fix SIGSEGV in nested mk_list_foreach_safe loops
Disclaimer: Used copilot to investigate the crash we faced and took it's help to write this
Summary
Fixes a SIGSEGV crash in the Azure Kusto output plugin when processing buffered backlog data on startup.
Root Cause
The ingest_all_chunks() function had nested mk_list_foreach_safe loops that both used the same tmp variable as the iterator:
mk_list_foreach_safe(head, tmp, &ctx->fs->streams) { // outer loop
...
mk_list_foreach_safe(f_head, tmp, &fs_stream->files) { // inner loop - SAME tmp!
The mk_list_foreach_safe macro stores the "next" pointer in its second argument for safe iteration during list modification. When the inner loop overwrote tmp, it corrupted the outer loop's iteration state, causing undefined behavior and eventually a SIGSEGV when the outer loop tried to continue iteration with a corrupted pointer.
Crash Stack Trace (from production)
[engine] caught signal (SIGSEGV)
#0 flush_init() at plugins/out_azure_kusto/azure_kusto.c:1169
#1 cb_azure_kusto_flush() at plugins/out_azure_kusto/azure_kusto.c:1257
#2 co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fix
Add a dedicated f_tmp variable for the inner loop to prevent iterator corruption:
struct mk_list *tmp;
struct mk_list *f_tmp; // NEW: dedicated inner loop iterator
...
mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {
...
mk_list_foreach_safe(f_head, f_tmp, &fs_stream->files) { // Use f_tmp
Testing
- Added
flb_test_azure_kusto_buffering_backlogregression test that exercises the buffering/backlog restart code path - All existing azure_kusto tests continue to pass
- Verified build compiles without warnings
Test json_invalid... [ OK ]
Test managed_identity_system... [ OK ]
Test managed_identity_user... [ OK ]
Test service_principal... [ OK ]
Test workload_identity... [ OK ]
Test buffering_backlog... [ OK ]
SUCCESS: All unit tests have passed.
Changes
plugins/out_azure_kusto/azure_kusto.c: Addf_tmpiterator variable for inner loop (+2 lines, -1 line)tests/runtime/out_azure_kusto.c: Add buffering backlog regression test (+73 lines)
Summary by CodeRabbit
-
Refactor
- Improved internal iteration safety and stability; no public API changes.
-
Tests
- Added an end-to-end buffering/backlog regression test that verifies two-phase write-and-restart behavior with cleanup.
- Added runtime helpers and test scaffolding to support buffered-backlog testing and temporary resource cleanup.
βοΈ Tip: You can customize this high-level summary in your review settings.
Walkthrough
Updated the Azure Kusto output plugin to use a dedicated inner safety iterator variable; added a new runtime test that verifies buffering/backlog processing with a two-phase Fluent Bit run and accompanying recursive cleanup helpers.
Changes
| Cohort / File(s) | Summary |
|---|---|
Iterator variable fixplugins/out_azure_kusto/azure_kusto.c |
Introduced a local f_tmp variable and switched the inner mk_list_foreach_safe to use f_tmp as the safety iterator while retaining f_head for element access. No public API changes. |
Buffering backlog test & helperstests/runtime/out_azure_kusto.c |
Added portability guards and new system headers, implemented flb_kusto_unlink_cb and flb_kusto_rm_rf helpers for recursive removal, declared and implemented flb_test_azure_kusto_buffering_backlog, and registered the test in TEST_LIST. The test runs twice (create disk-backed backlog, restart to process backlog) and performs buffer-dir cleanup. |
Estimated code review effort
π― 3 (Moderate) | β±οΈ ~20 minutes
- Areas to focus on:
- Correctness of
mk_list_foreach_safeusage with the newf_tmpiterator. - Test reliability: timing, deterministic waits, and teardown.
- Portability and correctness of recursive file removal helpers (
flb_kusto_rm_rf).
- Correctness of
Suggested labels
backport to v4.1.x
Suggested reviewers
- edsiper
- koleini
Poem
π I hopped through loops and found a tidy spot,
f_tmp kept watch where list edges are fraught.
I tucked some carrots on disk with care,
Restarted the burrow β backlog cleared from there.
Happy hops for tests and tidy code, so spry!
Pre-merge checks and finishing touches
β Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | β Passed | Check skipped - CodeRabbitβs high-level summary is enabled. |
| Title check | β Passed | The title clearly and accurately summarizes the main change: fixing a SIGSEGV crash caused by nested mk_list_foreach_safe loops in the Azure Kusto plugin. |
| Docstring Coverage | β Passed | Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%. |
β¨ Finishing touches
- [ ] π Generate docstrings
π§ͺ Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
π Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π₯ Commits
Reviewing files that changed from the base of the PR and between 03323cff3f37e41c811a623e83f40690778cfc7a and 5602e1f7124082f8c51790122d73bcaf7cc11737.
π Files selected for processing (1)
tests/runtime/out_azure_kusto.c(5 hunks)
π§° Additional context used
π§ Learnings (6)
π Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
tests/runtime/out_azure_kusto.c
π Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/runtime/out_azure_kusto.c
π Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
tests/runtime/out_azure_kusto.c
π Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/runtime/out_azure_kusto.c
π Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/runtime/out_azure_kusto.c
π Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
tests/runtime/out_azure_kusto.c
π Additional comments (6)
tests/runtime/out_azure_kusto.c (6)
3-5: LGTM!Proper portability guard for POSIX.1-2008 features (nftw, stat, etc.) used by the test helpers.
27-31: LGTM!All required headers for the test helpers and buffering_backlog test are present.
35-38: LGTM!The nftw callback signature is correct and the implementation is appropriate for test cleanup.
40-49: LGTM!The recursive removal logic is correct for test cleanup. The stat() check makes it idempotent, and the FTW_DEPTH | FTW_PHYS flags are appropriate for safe deletion.
59-59: LGTM!Test prototype and registration follow the standard Fluent Bit test pattern.
Also applies to: 68-68
243-330: Well-structured regression test for the nested loop SIGSEGV fix.The two-phase approach effectively exercises the backlog processing path (ingest_all_chunks) that previously crashed due to nested mk_list_foreach_safe with shared iterators. Key strengths:
- PID-unique buffer path ensures test isolation
- Cleanup before and after prevents stale data interference
- First run buffers data to disk, second run processes backlog on startup
- If the SIGSEGV occurs, the test will crash, validating the fix
- Past review concerns about directory cleanup and mkdir error handling have been addressed
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
@edsiper @cosmo0920 π any feedback on this PR? thank you!