fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

out_azure_blob: add log_key option

Open tomekwilk opened this issue 11 months ago • 15 comments

This PR is based on PR #3668 but addresses Azure blob storage. The azure_blob plugin was modify to accept 'log_key' option. By default the entire log record is sent to storage. When 'log_key' option is specified in the output plugin configuration, then only the value of the key is sent to the storage blob.

Addresses #9721

Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [x] Example configuration file for the change
  • [x] Debug log output from testing the change
  • [x] Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • [x] Documentation required for this feature

Doc PR https://github.com/fluent/fluent-bit-docs/pull/1540


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

By default the entire record is sent to azure blob storage. Here is an example of a sample configuration and default output

Configuration

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      dummy
    dummy     {"name": "Fluent Bit", "year": 2020}
    samples   1
    tag       var.log.containers.app-default-96cbdef2340.log

[OUTPUT]
    name                  azure_blob
    match                 *
    account_name          twilk123
    shared_key            <snip>
    path                  kubernetes
    container_name        test-container
    auto_create_container on
    tls                   on

Record without log_key {"@timestamp":"2025-01-02T16:56:02.906357Z","name":"Fluent Bit","year":2020}

if the 'log_key' is specified then only the specific key value is sent to azure blob storage

Sample configuration with log_key

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      dummy
    dummy     {"name": "Fluent Bit", "year": 2020}
    samples   1
    tag       var.log.containers.app-default-96cbdef2340.log

[OUTPUT]
    name                  azure_blob
    match                 *
    account_name          twilk123
    shared_key            <snip>
    path                  kubernetes
    container_name        test-container
    auto_create_container on
    tls                   on
    log_key               name

Record with log_key set to name Fluent Bit

Example Valgrind output

root@fluent-bit:/tmp# valgrind ./fluent-bit -c azure.conf
==3022== Memcheck, a memory error detector
==3022== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3022== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==3022== Command: ./fluent-bit -c azure.conf
==3022==
Fluent Bit v3.2.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2025/01/02 19:56:50] [ info] [fluent bit] version=3.2.3, commit=addf261e8c, pid=3022
[2025/01/02 19:56:50] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/01/02 19:56:50] [ info] [simd    ] disabled
[2025/01/02 19:56:50] [ info] [cmetrics] version=0.9.9
[2025/01/02 19:56:50] [ info] [ctraces ] version=0.5.7
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/01/02 19:56:50] [ info] [input:dummy:dummy.0] initializing
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/01/02 19:56:50] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/01/02 19:56:51] [ info] [output:azure_blob:azure_blob.0] account_name=twilk123, container_name=test-container, blob_type=appendblob, emulator_mode=no, endpoint=twilk123.blob.core.windows.net, auth_type=key
[2025/01/02 19:56:51] [ info] [sp] stream processor started
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] container 'test-container' already exists
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully:
[2025/01/02 19:56:54] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/01/02 19:57:03] [engine] caught signal (SIGINT)
[2025/01/02 19:57:03] [ warn] [engine] service will shutdown in max 5 seconds
[2025/01/02 19:57:03] [ info] [input] pausing dummy.0
[2025/01/02 19:57:03] [ info] [engine] service has stopped (0 pending tasks)
[2025/01/02 19:57:03] [ info] [input] pausing dummy.0
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/01/02 19:57:03] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopped
==3022==
==3022== HEAP SUMMARY:
==3022==     in use at exit: 0 bytes in 0 blocks
==3022==   total heap usage: 17,894 allocs, 17,894 frees, 2,471,158 bytes allocated
==3022==
==3022== All heap blocks were freed -- no leaks are possible
==3022==
==3022== For lists of detected and suppressed errors, rerun with: -s
==3022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Addresses #9721

Summary by CodeRabbit

  • New Features
    • Added log_key configuration option for Azure Blob Storage output plugin that allows extracting a specific field from incoming log records; when configured, only the value of the designated key will be sent to Azure Blob Storage.

✏️ Tip: You can customize this high-level summary in your review settings.

tomekwilk avatar Jan 02 '25 20:01 tomekwilk

@edsiper Can you please give us an update?

adrinaula avatar Mar 13 '25 16:03 adrinaula

memory leak test after rewrite:

$ valgrind build/bin/fluent-bit -c fluentbit.cfg
==225827== Memcheck, a memory error detector
==225827== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==225827== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==225827== Command: build/bin/fluent-bit -c fluentbit.cfg
==225827==
Fluent Bit v4.0.3
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _             ___  _____
|  ___| |                | |   | ___ (_) |           /   ||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| || |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| ||  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)___/


[2025/06/11 14:22:02] [ info] [fluent bit] version=4.0.3, commit=97285bdd2a, pid=225827
[2025/06/11 14:22:03] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/06/11 14:22:03] [ info] [simd    ] disabled
[2025/06/11 14:22:03] [ info] [cmetrics] version=1.0.2
[2025/06/11 14:22:03] [ info] [ctraces ] version=0.6.6
[2025/06/11 14:22:03] [ info] [input:dummy:dummy.0] initializing
[2025/06/11 14:22:03] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] account_name=devstoreaccount1, container_name=logs, blob_type=appendblob, emulator_mode=yes, endpoint=http://127.0.0.1
:10000, auth_type=key
[2025/06/11 14:22:03] [ info] [sp] stream processor started
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/06/11 14:22:03] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] container 'logs' already exists
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully:
[2025/06/11 14:22:05] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/06/11 14:22:18] [engine] caught signal (SIGINT)
[2025/06/11 14:22:18] [ warn] [engine] service will shutdown in max 5 seconds
[2025/06/11 14:22:18] [ info] [input] pausing dummy.0
[2025/06/11 14:22:18] [ info] [engine] service has stopped (0 pending tasks)
[2025/06/11 14:22:18] [ info] [input] pausing dummy.0
[2025/06/11 14:22:18] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/06/11 14:22:18] [ info] [output:azure_blob:azure_blob.0] initializing worker
==225827==
==225827== HEAP SUMMARY:
==225827==     in use at exit: 0 bytes in 0 blocks
==225827==   total heap usage: 7,292 allocs, 7,292 frees, 1,413,601 bytes allocated
==225827==
==225827== All heap blocks were freed -- no leaks are possible
==225827==
==225827== For lists of detected and suppressed errors, rerun with: -s
==225827== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

tomekwilk avatar Jun 11 '25 14:06 tomekwilk

Hi @tomekwilk @lockewritesdocs, Just checking in, are there any blockers preventing this PR from being merged? Let me know if there's anything I can do to help move it forward.

khalillilahk avatar Sep 29 '25 09:09 khalillilahk

Walkthrough

The Azure Blob output plugin has been enhanced with a log_key feature that enables extraction of a specific field from incoming msgpack data. The azure_blob_format function signature has been expanded to accept additional metadata and context parameters, complemented by a new helper function that handles field extraction and type conversion when log_key is configured.

Changes

Cohort / File(s) Summary
Log key extraction and formatting logic
plugins/out_azure_blob/azure_blob.c
Added static helper function cb_azb_msgpack_extract_log_key to locate and extract a specific field from msgpack data via record accessor, with support for string, float, and int types. Updated azure_blob_format signature to accept additional parameters (flush context, event type, tag, data payload) and return formatted output via pointer parameters. Function now conditionally routes through log key extraction when configured. Added new header includes: flb_record_accessor.h and flb_ra_key.h.
Configuration support
plugins/out_azure_blob/azure_blob.c
Added log_key configuration entry to the public config map for struct flb_azure_blob, exposing the field as a string configuration option.
Data structure
plugins/out_azure_blob/azure_blob.h
Added flb_sds_t log_key field to struct flb_azure_blob for storing the configured log key.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Record accessor API usage: Verify correct usage of record accessor to locate and extract fields from msgpack data
  • Type conversion logic: Review string/float/int conversion paths and null-termination handling in the helper function
  • Function signature impact: Trace how the expanded azure_blob_format signature integrates with plugin callback mechanisms and any callers
  • Error handling paths: Ensure robust cleanup and error reporting for missing fields, unsupported types, and allocation failures

Suggested reviewers

  • leonardo-albertovich
  • koleini
  • fujimotos

Poem

🐰 A key to unlock the log's hidden treasure, Extract the field at a rabbit's own pleasure, Msgpack data flows through a new winding way, The blob stores the truth that we seek every day! 🌿

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a log_key option to the azure_blob output plugin.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Sep 30 '25 16:09 coderabbitai[bot]

I rebased the PR to resolve the merge conflicts after recent master changes. This PR is waiting to be re-reviewed and merged. Not sure if there is anything else for me to do.

tomekwilk avatar Sep 30 '25 16:09 tomekwilk

Hello @edsiper , @adrinaula ,

This PR tackles an issue that we've also recently faced. Any idea if there are anything preventing/blocking the merger?

Would be interested to contribute if need be :) .

Thanks in Advance,

SamerJ avatar Oct 15 '25 12:10 SamerJ

@tomekwilk Eduardo requested a change, can you take a look at fixing?

eschabell avatar Oct 21 '25 20:10 eschabell

@tomekwilk Eduardo requested a change, can you take a look at fixing?

which change are we talking about ? this one ? flb_errno() needs to be called before flb_plg_error()

If we can help in any way don't hesitate, we have the exact same requirement but we don't want to create a new PR that does exactly what @tomekwilk did...

overmeulen avatar Nov 03 '25 11:11 overmeulen

I fixed one place where flb_errno() was after flb_plg_error() and rebased the PR. Not sure what else can be blocking this PR. I requested re-review after addressing the initial comments but heard nothing back.

If anyone would like to help push this PR forward or verify the change feel free, it would be appreciated. I am currently traveling and have limited access. Thanks!

tomekwilk avatar Nov 03 '25 17:11 tomekwilk

@eschabell @edsiper what's missing to validate this PR ?

overmeulen avatar Nov 12 '25 15:11 overmeulen

Hello, When will this fix be released ?

jadmourad avatar Nov 27 '25 10:11 jadmourad

@eschabell @edsiper what's missing to validate this PR ?

Hey @overmeulen looks like it's waiting on user changes requested by reviewer?

eschabell avatar Dec 03 '25 14:12 eschabell

Removed log_key cleanup from flb_azure_blob_conf_destroy(). This change was suggested by the coderabbitai and was causing double free error. log_key is part of the config map and is freed when plugin instance is destroyed.

Here is the updated volgrind output. I believe that the error below is not related to this PR.

~/fluent-bit/build (dev-vm-461200)$ valgrind bin/fluent-bit -c fluentbit.conf                                    
==22945== Memcheck, a memory error detector
==22945== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==22945== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==22945== Command: bin/fluent-bit -c fluentbit.conf
==22945== 
Fluent Bit v4.2.1
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF graduated project under the Fluent organization
* https://fluentbit.io

______ _                  _    ______ _ _             ___   _____ 
|  ___| |                | |   | ___ (_) |           /   | / __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| | `' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| |   / /  
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |_./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)_____/
                                                                  
             Fluent Bit v4.2 – Direct Routes Ahead
         Celebrating 10 Years of Open, Fluent Innovation!

[2025/12/03 17:21:41.239025245] [ info] [fluent bit] version=4.2.1, commit=d9749a9eff, pid=22945
[2025/12/03 17:21:41.298763381] [ info] [storage] ver=1.5.4, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/12/03 17:21:41.299828308] [ info] [simd    ] disabled
[2025/12/03 17:21:41.512516957] [ info] [output:azure_blob:azure_blob.0] initializing worker
[2025/12/03 17:21:41.300483317] [ info] [cmetrics] version=1.0.5
[2025/12/03 17:21:41.515441205] [ info] [output:azure_blob:azure_blob.0] worker #0 started
[2025/12/03 17:21:41.301077789] [ info] [ctraces ] version=0.6.6
[2025/12/03 17:21:41.332743693] [ info] [input:dummy:dummy.0] initializing
[2025/12/03 17:21:41.334188627] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2025/12/03 17:21:41.421599808] [ info] [output:azure_blob:azure_blob.0] account_name=devstoreaccount1, container_name=test-container, b
lob_type=appendblob, emulator_mode=yes, endpoint=http://127.0.0.1:10000, auth_type=key
[2025/12/03 17:21:41.484604743] [ info] [sp] stream processor started
[2025/12/03 17:21:41.489661070] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
==22945== Warning: client switching stacks?  SP change: 0x7b42548 --> 0x6062190
==22945==          to suppress, use: --max-stackframe=28181432 or greater
==22945== Warning: client switching stacks?  SP change: 0x6062078 --> 0x7b42548
==22945==          to suppress, use: --max-stackframe=28181712 or greater
==22945== Warning: client switching stacks?  SP change: 0x7b42548 --> 0x6062078
==22945==          to suppress, use: --max-stackframe=28181712 or greater
==22945==          further instances of this message will not be shown.
[2025/12/03 17:21:43.625144119] [ info] [output:azure_blob:azure_blob.0] container 'test-container' already exists
==22945== Thread 4 flb-out-azure_bl:
==22945== Conditional jump or move depends on uninitialised value(s)
==22945==    at 0x484F229: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==22945==    by 0x531FDA7: __printf_buffer (vfprintf-process-arg.c:435)
==22945==    by 0x5344D90: __vsnprintf_internal (vsnprintf.c:96)
==22945==    by 0x5344D90: vsnprintf (vsnprintf.c:103)
==22945==    by 0x25F9B9: flb_sds_printf (flb_sds.c:357)
==22945==    by 0x7BE148: azb_block_blob_uri_commit (azure_blob_blockblob.c:133)
==22945==    by 0x7BEC42: azb_block_blob_commit_block (azure_blob_blockblob.c:341)
==22945==    by 0x798E06: send_blob (azure_blob.c:637)
==22945==    by 0x79F0B0: cb_azure_blob_flush (azure_blob.c:1753)
==22945==    by 0x2969DF: output_pre_cb_flush (flb_output.h:706)
==22945==    by 0x167C06A: co_init (amd64.c:117)
==22945== 
[2025/12/03 17:21:43.686486905] [ info] [output:azure_blob:azure_blob.0] content uploaded successfully: 
[2025/12/03 17:21:43.706190168] [ info] [output:azure_blob:azure_blob.0] blob id (null) committed successfully
^C[2025/12/03 17:22:10] [engine] caught signal (SIGINT)
[2025/12/03 17:22:10.573687209] [ warn] [engine] service will shutdown in max 5 seconds
[2025/12/03 17:22:10.574904549] [ info] [engine] pausing all inputs..
[2025/12/03 17:22:10.576506925] [ info] [input] pausing dummy.0
[2025/12/03 17:22:10.889106628] [ info] [engine] service has stopped (0 pending tasks)
[2025/12/03 17:22:10.889598275] [ info] [input] pausing dummy.0
[2025/12/03 17:22:10.894383437] [ info] [output:azure_blob:azure_blob.0] thread worker #0 stopping...
[2025/12/03 17:22:10.896913640] [ info] [output:azure_blob:azure_blob.0] initializing worker
==22945== 
==22945== HEAP SUMMARY:
==22945==     in use at exit: 0 bytes in 0 blocks
==22945==   total heap usage: 8,901 allocs, 8,901 frees, 1,943,861 bytes allocated
==22945== 
==22945== All heap blocks were freed -- no leaks are possible
==22945== 
==22945== Use --track-origins=yes to see where uninitialised values come from
==22945== For lists of detected and suppressed errors, rerun with: -s
==22945== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

tomekwilk avatar Dec 03 '25 17:12 tomekwilk

@eschabell I believe that all @edsiper comments were addressed. What am I missing?

tomekwilk avatar Dec 03 '25 17:12 tomekwilk

@edsiper @cosmo0920 @lecaros can someone look at this as reviewers for @tomekwilk, he's doing his part and waiting on feedback.

eschabell avatar Dec 10 '25 09:12 eschabell