fluent-bit
fluent-bit copied to clipboard
in_kubernetes_events: Efficiently stream kubernetes events via watch
Change in_kubernetes_events plugin to watch kubernetes events after requesting the event list. Instead of polling for the full event list every 500ms (default), an initial full events list is requested and then is a watch is requested. The watch will create an efficient http chunked stream that will push events as they are added, modified, or deleted from the cluster. The interval_sec and interval_nsec plugin config options now act as a reconnect timer if the watch stream is ended, instead of timer to re-poll the k8s cluster.
Potentially Breaking: this will require the kubernetes role used by fluent-bit to have watch permission in addition to the current list and get permissions.
Fixes #8315
Leaving in draft as this is dependent on both #8316 & #8323, will rebase and move out of draft after those are reviewed/merged.
Enter [N/A] in the box, if an item is not applicable to your change.
Testing Before we can approve your change; please submit the following in a comment:
- [ X] Example configuration file for the change
[INPUT]
name kubernetes_events
tag k8s_events
- [ ] Debug log output from testing the change
- [ ] Attached Valgrind output that shows no leaks or memory corruption was found
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
- [ ] Run local packaging test showing all targets (including any new ones) build.
- [ ] Set
ok-package-testlabel to test for all targets (requires maintainer to do).
Documentation
- [ ] Documentation required for this feature
Backporting
- [ ] Backport to latest stable release.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
@edsiper - just force pushed a new version of this. I still have this PR draft mode because it was branched off of 2 other PRs: https://github.com/fluent/fluent-bit/pull/8316 & https://github.com/fluent/fluent-bit/pull/8323. I was assuming it would be easier to review those each individually and then I'd rebase this one with a smaller set of changes to review, but if you'd prefer to just do them all in one change PR review/merge, I can move this pr to ready to review.
@ryanohnemus https://github.com/fluent/fluent-bit/pull/8316 & https://github.com/fluent/fluent-bit/pull/8323 are already merged. Let me know if this is ready please for v3 (before next monday)
Sounds great! I will rebase and fix the merge conflict and have this ready by tomorrow morning!
@edsiper this should be ready, however i'm getting ci build errors that i'm not able to reproduce in the dev container with the same cmake:
cmake -DFLB_BACKTRACE=Off -DFLB_SHARED_LIB=Off -DFLB_DEBUG=On -DFLB_ALL=On -DFLB_EXAMPLES=Off -DFLB_JEMALLOC=On -DFLB_TESTS_INTERNAL=On -DFLB_TESTS_RUNTIME=On -DFLB_WITHOUT_flb-rt-out_elasticsearch=1 -DFLB_WITHOUT_flb-rt-out_td=1 -DFLB_WITHOUT_flb-rt-out_forward=1 -DFLB_WITHOUT_flb-rt-in_disk=1 -DFLB_WITHOUT_flb-rt-in_proc=1 -DFLB_WITHOUT_flb-it-fstore=1 ../
The error is below, but i do not believe this is related to the changes in this diff:
[ 21%] Building C object plugins/processor_sql/parser/CMakeFiles/processor-sql-parser.dir/sql_parser.c.o
In file included from /home/runner/work/fluent-bit/fluent-bit/plugins/processor_sql/parser/sql_parser.c:9:0:
/home/runner/work/fluent-bit/fluent-bit/include/fluent-bit/flb_mem.h:31:10: fatal error: jemalloc/jemalloc.h: No such file or directory
#include <jemalloc/jemalloc.h>
^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [plugins/processor_sql/parser/CMakeFiles/processor-sql-parser.dir/build.make:90: plugins/processor_sql/parser/CMakeFiles/processor-sql-parser.dir/sql_parser.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6975: plugins/processor_sql/parser/CMakeFiles/processor-sql-parser.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
@edsiper I was able to get a repro of the jemalloc build failure off the master branch and I believe it starts occurring with this commit: https://github.com/fluent/fluent-bit/commit/a89bf1cb7d8d7773161fd315abb8933c408a6600
@edsiper @patrick-stephens this didn't get merged in with 3.0 but it is ready to go. Can this be added to the next milestone/release?
Did you carry out the review comments from @edsiper ?
@patrick-stephens yes, unless I missed something they should all be marked resolved. I rebased and force pushed after some dependencies had a merge conflict, i think messes with the comment visibility
@pwhelan do you think you can take a look at this one ?
Rebased and force pushed the following updates:
- moved upstream and http client for stream to context to ensure we can clean this up on shutdown for no leaks
- added 2 tests that are using monkey/server to mock the k8s upstream connection
- chunked-streaming test is something that can be added later on, these tests were to show the no memory leak:
- continuous wait on stream still exists, assuming this is ok since this the input is threaded
valgrind --leak-check=full ./bin/flb-rt-in_kubernetes_events
[removed test output]
SUCCESS: All unit tests have passed.
==931==
==931== HEAP SUMMARY:
==931== in use at exit: 0 bytes in 0 blocks
==931== total heap usage: 3,845 allocs, 3,845 frees, 2,002,206 bytes allocated
==931==
==931== All heap blocks were freed -- no leaks are possible
==931==
==931== For lists of detected and suppressed errors, rerun with: -s
==931== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
@pwhelan please take another look
(posting this before CI has fully run, so i will review any test failures if they show up)
appveyor and macos build errors both appear to be flakes.
@pwhelan @edsiper @lecaros - this fell out of the next milestone with the last few 3.0 releases but is still ready to go. Can this be added to the existing 3.0.7 milestone? Thank you
Rebased to fix merge conflict.
@edsiper @pwhelan @lecaros could this be tagged in the 3.1.0 milestone so it does not get missed? Thank you!
We probably should update docs as well, particularly with the RBAC change. Could you link a docs PR @ryanohnemus ?
Do we have any int tests for this btw?
@patrick-stephens added doc via https://github.com/fluent/fluent-bit-docs/pull/1396
No int tests, but I added unit tests for the plugin in this PR.