aws: add support for EKS Pod Identities
Enter [N/A] in the box, if an item is not applicable to your change.
Testing Before we can approve your change; please submit the following in a comment:
- [ ] Example configuration file for the change
- [ ] Debug log output from testing the change
- [ ] Attached Valgrind output that shows no leaks or memory corruption was found
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
- [ ] Run local packaging test showing all targets (including any new ones) build.
- [ ] Set
ok-package-testlabel to test for all targets (requires maintainer to do).
Documentation
- [ ] Documentation required for this feature
Backporting
- [ ] Backport to latest stable release.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
@edsiper I tested these changes thoroughly on a new EKS cluster back in May. My change has unit tests which pass. It should be safe to merge after I just now performed a simple rebase with no conflicts. Unfortunately I am unable test this again right now.
Please see my comments on the alternate (mostly the same) implementation: https://github.com/fluent/fluent-bit/pull/9013#pullrequestreview-2237131723
@PettitWesley there are some memory leaks detected in the unit test:
https://github.com/fluent/fluent-bit/actions/runs/10381178881/job/28776004954?pr=9206#step:5:3421
@PettitWesley @iandrewt
In the branch eks-pod-identity3.0 I pushed some commits on top of this branch/PR to fix the leaks found. The patch in order are:
- https://github.com/fluent/fluent-bit/commit/d7a76688c8454736b08b488794dc4c8a69f7c181 aws: fix leaks on new EKS Pod identities
- https://github.com/fluent/fluent-bit/commit/f0e6f2bc13ee4f9e7e02d332fcdca24b44d3db03 utils: validate sds port
- https://github.com/fluent/fluent-bit/commit/ec2209c948ab0d825c5f1da357d84c96d7f4b079 tests: internal: aws_credentials_http: fix leaks
Remaining issues found with Valgrind:
valgrind --leak-check=full bin/flb-it-aws_credentials_http
Test test_http_validator_invalid_host... [ FAILED ]
aws_credentials_http.c:728: Check provider == NULL... failed
==964174== Warning: invalid file descriptor -1 in syscall close()
Test test_http_validator_invalid_port... [ FAILED ]
aws_credentials_http.c:757: Check provider == NULL... failed
==964174== Warning: invalid file descriptor -1 in syscall close()
FAILED: 2 of 11 unit tests have failed.
==964174==
==964174== HEAP SUMMARY:
==964174== in use at exit: 19,024 bytes in 9 blocks
==964174== total heap usage: 17,168 allocs, 17,159 frees, 1,808,409 bytes allocated
==964174==
==964174== 212 (96 direct, 116 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 9
==964174== at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==964174== by 0x16D064: flb_calloc (include/fluent-bit/flb_mem.h:95)
==964174== by 0x16D47F: flb_endpoint_provider_create (src/aws/flb_aws_credentials_http.c:265)
==964174== by 0x16DA26: flb_http_provider_create (src/aws/flb_aws_credentials_http.c:394)
==964174== by 0x162EDD: test_http_validator_invalid_host (tests/internal/aws_credentials_http.c:727)
==964174== by 0x1609B7: acutest_do_run_ (tests/internal/../lib/acutest/acutest.h:1034)
==964174== by 0x15F68D: acutest_run_ (tests/internal/../lib/acutest/acutest.h:1205)
==964174== by 0x15E329: main (tests/internal/../lib/acutest/acutest.h:1769)
==964174==
==964174== 212 (96 direct, 116 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 9
==964174== at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==964174== by 0x16D064: flb_calloc (include/fluent-bit/flb_mem.h:95)
==964174== by 0x16D47F: flb_endpoint_provider_create (src/aws/flb_aws_credentials_http.c:265)
==964174== by 0x16DA26: flb_http_provider_create (src/aws/flb_aws_credentials_http.c:394)
==964174== by 0x163053: test_http_validator_invalid_port (tests/internal/aws_credentials_http.c:756)
==964174== by 0x1609B7: acutest_do_run_ (tests/internal/../lib/acutest/acutest.h:1034)
==964174== by 0x15F68D: acutest_run_ (tests/internal/../lib/acutest/acutest.h:1205)
==964174== by 0x15E329: main (tests/internal/../lib/acutest/acutest.h:1769)
==964174==
==964174== 18,600 bytes in 1 blocks are definitely lost in loss record 9 of 9
==964174== at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==964174== by 0x15F8C4: flb_calloc (include/fluent-bit/flb_mem.h:95)
==964174== by 0x162E6F: test_http_validator_invalid_host (tests/internal/aws_credentials_http.c:722)
==964174== by 0x1609B7: acutest_do_run_ (tests/internal/../lib/acutest/acutest.h:1034)
==964174== by 0x15F68D: acutest_run_ (tests/internal/../lib/acutest/acutest.h:1205)
==964174== by 0x15E329: main (tests/internal/../lib/acutest/acutest.h:1769)
==964174==
==964174== LEAK SUMMARY:
==964174== definitely lost: 18,792 bytes in 3 blocks
==964174== indirectly lost: 232 bytes in 6 blocks
==964174== possibly lost: 0 bytes in 0 blocks
==964174== still reachable: 0 bytes in 0 blocks
==964174== suppressed: 0 bytes in 0 blocks
==964174==
just trying to speed up things, can you pls review the commits and cherry-pick them ?
moving this for 3.2. we need someone to incorporate the changes
Hi @edsiper and @PettitWesley, I created a new PR merging both your changes and resolved the master branch merge conflicts in this PR: https://github.com/fluent/fluent-bit/pull/9696. I tested the changes in EKS and verified that they work. Could you guys take a look?
If we prefer to keep the contributions in this PR, let me know. Unsure if I need to be granted any access to make changes to this PR or not if we go that route.
Honestly, kinda forgot to follow this one up. Things are slow in December at work, so I'll have some time to test this out on Monday Australia time.
I've deployed @zhihonl's branch to a non production cluster this morning, no issues so far! S3 uploads are working fine. Will check again on Monday to see if anything pops up over the weekend.
This can be closed as duplicate of https://github.com/fluent/fluent-bit/pull/10114
This can be closed as duplicate of #10114
Closing from this