fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

out_es: add cloud_apikey configuration

Open soedar opened this issue 1 year ago • 17 comments

Adds Elastic Cloud API Key support to the out_es plugin. This patch adds a new config option, cloud_apikey, which would be added to the HTTP request through the Authorization: Apikey <cloud_apikey> header.

Addresses #6727. While we can re-use the cloud_auth config option, we would have to make additional assumptions on the API Key to identify it properly (i.e. does does not contain :, is base64 encoded, etc).


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [x] Example configuration file for the change
  • [x] Debug log output from testing the change
  • [x] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [x] Documentation required for this feature

https://github.com/fluent/fluent-bit-docs/pull/1213

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

soedar avatar Sep 18 '23 03:09 soedar

Example configuration

[SERVICE]
    Flush     1
    Daemon    off
    Log_Level debug

[INPUT]
    Name      cpu

[OUTPUT]
    Name      stdout
    Match     *

[OUTPUT]
    Name                es
    Match               *
    tls                 On
    tls.verify          Off
    Cloud_Id            <redacted>
    Cloud_Apikey        <redacted>
    Suppress_Type_Name  On

Debug output and Valgrind

$ valgrind ./bin/fluent-bit -c es.conf
==70111== Memcheck, a memory error detector
==70111== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==70111== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==70111== Command: ./bin/fluent-bit -c es.conf
==70111==
Fluent Bit v2.1.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/09/18 02:55:15] [ info] Configuration:
[2023/09/18 02:55:15] [ info]  flush time     | 1.000000 seconds
[2023/09/18 02:55:15] [ info]  grace          | 5 seconds
[2023/09/18 02:55:15] [ info]  daemon         | 0
[2023/09/18 02:55:15] [ info] ___________
[2023/09/18 02:55:15] [ info]  inputs:
[2023/09/18 02:55:15] [ info]      cpu
[2023/09/18 02:55:15] [ info] ___________
[2023/09/18 02:55:15] [ info]  filters:
[2023/09/18 02:55:15] [ info] ___________
[2023/09/18 02:55:15] [ info]  outputs:
[2023/09/18 02:55:15] [ info]      stdout.0
[2023/09/18 02:55:15] [ info]      es.1
[2023/09/18 02:55:15] [ info] ___________
[2023/09/18 02:55:15] [ info]  collectors:
[2023/09/18 02:55:15] [ info] [fluent bit] version=2.1.10, commit=b777d90050, pid=70111
[2023/09/18 02:55:15] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/09/18 02:55:15] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/09/18 02:55:15] [ info] [cmetrics] version=0.6.3
[2023/09/18 02:55:15] [ info] [output:stdout:stdout.0] worker #0 started
[2023/09/18 02:55:15] [ info] [ctraces ] version=0.3.1
[2023/09/18 02:55:15] [ info] [input:cpu:cpu.0] initializing
[2023/09/18 02:55:15] [ info] [input:cpu:cpu.0] storage_strategy='memory' (memory only)
[2023/09/18 02:55:15] [debug] [cpu:cpu.0] created event channels: read=21 write=22
[2023/09/18 02:55:15] [debug] [stdout:stdout.0] created event channels: read=23 write=24
[2023/09/18 02:55:15] [debug] [es:es.1] created event channels: read=30 write=31
[2023/09/18 02:55:16] [debug] [output:es:es.1] extracted cloud_host: '<redacted>'
[2023/09/18 02:55:16] [debug] [output:es:es.1] cloud_host: '<redacted>' does not contain a port: '<redacted>'
[2023/09/18 02:55:16] [ info] [output:es:es.1] worker #1 started
[2023/09/18 02:55:16] [ info] [output:es:es.1] worker #0 started
[2023/09/18 02:55:16] [debug] [output:es:es.1] checked whether extracted port was null and set it to default https port or not. Outcome: '443' and cloud_host: '<redacted>'.
[2023/09/18 02:55:16] [debug] [output:es:es.1] host=<redacted> port=443 uri=/_bulk index=fluent-bit type=_doc
[2023/09/18 02:55:16] [debug] [router] match rule cpu.0:stdout.0
[2023/09/18 02:55:16] [debug] [router] match rule cpu.0:es.1
[2023/09/18 02:55:16] [ info] [sp] stream processor started
[2023/09/18 02:55:17] [debug] [input chunk] update output instances with new chunk size diff=207, records=1, input=cpu.0
^C[2023/09/18 02:55:17] [engine] caught signal (SIGINT)
[2023/09/18 02:55:17] [debug] [task] created task=0x5322eb0 id=0 OK
[0] cpu.0: [[1695005716.948405812, {}], {"cpu_p"=>34.000000, "user_p"=>33.000000, "system_p"=>1.000000, "cpu0.p_cpu"=>7.000000, "cpu0.p_user"=>6.000000, "cpu0.p_system"=>1.000000, "cpu1.p_cpu"=>62.000000, "cpu1.p_user"=>61.000000, "cpu1.p_system"=>1.000000}]
[2023/09/18 02:55:17] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2023/09/18 02:55:17] [debug] [out flush] cb_destroy coro_id=0
[2023/09/18 02:55:17] [debug] [output:es:es.1] task_id=0 assigned to thread #0
[2023/09/18 02:55:17] [ warn] [engine] service will shutdown in max 5 seconds
[2023/09/18 02:55:17] [ info] [input] pausing cpu.0
[2023/09/18 02:55:18] [ info] [task] cpu/cpu.0 has 1 pending task(s):
[2023/09/18 02:55:18] [ info] [task]   task_id=0 still running on route(s): stdout/stdout.0 es/es.1
[2023/09/18 02:55:18] [ info] [input] pausing cpu.0
[2023/09/18 02:55:18] [debug] [upstream] KA connection #60 to <redacted>:443 is connected
[2023/09/18 02:55:18] [debug] [http_client] not using http_proxy for header
[2023/09/18 02:55:18] [debug] [output:es:es.1] using elastic cloud apikey
[2023/09/18 02:55:18] [debug] [output:es:es.1] HTTP Status=200 URI=/_bulk
[2023/09/18 02:55:18] [debug] [output:es:es.1] Elasticsearch response
{"took":31,"errors":false,"items":[{"create":{"_index":"fluent-bit","_id":"fko2pooBqbJxt1RguQ8l","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":31,"_primary_term":1,"status":201}}]}
[2023/09/18 02:55:18] [debug] [upstream] KA connection #60 to <redacted>:443 is now available
[2023/09/18 02:55:18] [debug] [task] destroy task=0x5322eb0 (task_id=0)
[2023/09/18 02:55:18] [debug] [out flush] cb_destroy coro_id=0
[2023/09/18 02:55:18] [ info] [input] pausing cpu.0
[2023/09/18 02:55:19] [ info] [engine] service has stopped (0 pending tasks)
[2023/09/18 02:55:19] [ info] [input] pausing cpu.0
[2023/09/18 02:55:19] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/09/18 02:55:19] [ info] [output:stdout:stdout.0] thread worker #0 stopped
[2023/09/18 02:55:20] [ info] [output:es:es.1] thread worker #0 stopping...
[2023/09/18 02:55:20] [ info] [output:es:es.1] thread worker #0 stopped
[2023/09/18 02:55:20] [ info] [output:es:es.1] thread worker #1 stopping...
[2023/09/18 02:55:20] [ info] [output:es:es.1] thread worker #1 stopped
==70111==
==70111== HEAP SUMMARY:
==70111==     in use at exit: 0 bytes in 0 blocks
==70111==   total heap usage: 18,885 allocs, 18,885 frees, 2,765,200 bytes allocated
==70111==
==70111== All heap blocks were freed -- no leaks are possible
==70111==
==70111== For lists of detected and suppressed errors, rerun with: -s
==70111== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

soedar avatar Sep 18 '23 03:09 soedar

@patrick-stephens could you assist to re-run the integration tests? not quite sure why the integration runs have been failed the first time round. I've rebased master

soedar avatar Oct 18 '23 07:10 soedar

@patrick-stephens could you assist to re-run the integration tests? not quite sure why the integration runs have been failed the first time round. I've rebased master

Do you mean unit tests? Integration tests are not run unless this is labelled. macOS unit tests are flaky at the moment I believe so can be ignored as long as Linux passes.

patrick-stephens avatar Oct 19 '23 11:10 patrick-stephens

Do you mean unit tests? Integration tests are not run unless this is labelled. macOS unit tests are flaky at the moment I believe so can be ignored as long as Linux passes.

Ah, that's what I meant. I noticed the failing macOS test and wasn't sure if that was the blocker for the PR.

What would be the next steps to move this PR forward?

soedar avatar Oct 20 '23 02:10 soedar

It's on the codeowners to review so will be in the queue.

patrick-stephens avatar Oct 20 '23 10:10 patrick-stephens

Can we extend this feature?

  1. The header name should be dynamic. There are many cases when other headers are used, for example Bearer header instead of apiKey header.
  2. The header value should be taken dynamically from a file instead of static value. The file can be dynamically updated when a value/token is updated/refreshed.

alexku7 avatar Dec 07 '23 07:12 alexku7

The header name should be dynamic. There are many cases when other headers are used, for example Bearer header instead of apiKey header.

Could you elaborate a use case where the Bearer header is used in the context of elasticsearch? This change in particular is to support integration with Elastic Cloud via API Keys (see https://www.elastic.co/guide/en/cloud/current/ec-api-authentication.html)

Regardless, I'm not quite sure that allowing users to specify arbitrary authorization headers is ideal, especially if the set of the allowable authorization type for the plugin could be well defined.

The header value should be taken dynamically from a file instead of static value. The file can be dynamically updated when a value/token is updated/refreshed.

Looking at other fluentbit output plugins, this does not appear to be a common pattern. (The only exception seems to be Google Cloud Credential json, which seem to contain quite a bit of auth information, which would probably not be the norm). I would be hesistant to make this change in this PR without maintainers' inputs, since this feels like a config design change that would also be applicable to other plugins.

soedar avatar Dec 14 '23 01:12 soedar

I think personally I would be of the opinion to keep things simple in a PR, land one feature before adding more.

patrick-stephens avatar Dec 14 '23 10:12 patrick-stephens

Hi On the other hand the ElasticSearch supports a JWT token as a bearer authorization header and probably other methods. So why not to support universally any http header set by the user as a env variable or as a file containing this header (for security reasons)

We do a similar thing with Prometheus sending metrics to a remote store. See the authorization section here

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write

alexku7 avatar Dec 14 '23 10:12 alexku7

Hi any updates on merging this? This feature will be of great help ❤️

yanbutan avatar Jan 10 '24 10:01 yanbutan

that's a shame, i won't be able to use fluent bit because it does not support sending logs using elastic api keys

ICUMD avatar Jan 25 '24 19:01 ICUMD

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar May 01 '24 01:05 github-actions[bot]

rebased

@patrick-stephens @edsiper @PettitWesley could you remove the stale label?

soedar avatar May 01 '24 04:05 soedar

adding the comment to move it from the stale state.

What ar the current blockers? as this is really long avaited feature. Thank you @soedar for making this.

dariusvalaitis avatar Aug 12 '24 06:08 dariusvalaitis