fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

filter_ecs: new filter for AWS ECS Metadata

Open PettitWesley opened this issue 3 years ago • 1 comments

Signed-off-by: Wesley Pettit [email protected]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [ ] Example configuration file for the change
  • [ ] Debug log output from testing the change
  • [ ] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [ ] Attached local packaging test output showing all targets (including any new ones) build.

Documentation

  • [ ] Documentation required for this feature

Backporting

  • [ ] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

PettitWesley avatar Aug 16 '22 14:08 PettitWesley

Running on an instance inside of an ECS cluster:

$ docker ps
CONTAINER ID   IMAGE                                                             COMMAND                CREATED      STATUS                PORTS     NAMES
de7fbb1b66db   111111111111.dkr.ecr.us-west-2.amazonaws.com/better-json-logger   "python ./logger.py"   3 days ago   Up 3 days                       ecs-fb-daemon-demo-1-app-c0d3dccbb0fdcd820400
c5d660dc5642   amazon/amazon-ecs-agent:latest                                    "/agent"               4 days ago   Up 4 days (healthy)             ecs-agent

The first container is part of an ECS Task. The filter's use case is to attach metadata to its logs. I could set up Fluent Bit to actually collect its logs, but for testing, the easiest thing to do is to set a config to mimic a tag coming from a real task:

[SERVICE]
    Log_Level info
    Grace 1

[INPUT]
    Name dummy
    Tag prefix.de7fbb1b66db


[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224

[FILTER]
    Name ecs
    Match *
    ECS_Tag_Prefix prefix.
    ecs_meta_cache_ttl 6h
#    Cluster_Metadata_Only On
    ADD THE_CLUSTER_IS $ClusterName
    ADD THE_CONTAINER_INSTANCE_ARN_IS $ContainerInstanceArn
    ADD THE_CONTAINER_INSTANCE_ID_IS $ContainerInstanceID
    ADD THE_ECS_AGENT_VERSION_IS $ECSAgentVersion
    ADD THE_TASK_ID_IS $TaskID
    ADD THE_TASK_ARN_IS $TaskARN
    ADD THE_TASK_DEF_CONTAINER_NAME_IS $ContainerName
    ADD THE_DOCKER_CONTAINER_NAME_IS $DockerContainerName
    ADD THE_DOCKER_ID_IS $ContainerID
    ADD THE_TASK_DEF_FAMILY_IS $TaskDefFamily
    ADD THE_TASK_DEF_VERSION_IS $TaskDefVersion

[OUTPUT]
    Name stdout
    Format json_lines
    Match *

Even though we use dummy input, the tag and ECS_Tag_Prefix configured makes filter think the logs are coming from the task:

Fluent Bit v1.9.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/08/16 14:51:53] [ info] [fluent bit] version=1.9.7, commit=3d57a63e54, pid=24547
[2022/08/16 14:51:53] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/08/16 14:51:53] [ info] [cmetrics] version=0.3.5
[2022/08/16 14:51:53] [ info] [input:forward:forward.1] listening on 0.0.0.0:24224
[2022/08/16 14:51:54] [ info] [sp] stream processor started
[2022/08/16 14:51:54] [ info] [output:stdout:stdout.0] worker #0 started
{
  "date": 1660661514.303914,
  "message": "dummy",
  "THE_CLUSTER_IS": "fb-daemon-project",
  "THE_CONTAINER_INSTANCE_ARN_IS": "arn:aws:ecs:us-west-1:111111111111:container-instance/fb-daemon-project/c7e5e34d4157429c90337d8e6f130612",
  "THE_CONTAINER_INSTANCE_ID_IS": "c7e5e34d4157429c90337d8e6f130612",
  "THE_ECS_AGENT_VERSION_IS": "Amazon ECS Agent - v1.61.3 (63f97f40)",
  "THE_TASK_ID_IS": "bf3152cb-08c8-4f76-b974-0ad5b2993f9d",
  "THE_TASK_ARN_IS": "arn:aws:ecs:us-west-1:111111111111:task/bf3152cb-08c8-4f76-b974-0ad5b2993f9d",
  "THE_TASK_DEF_CONTAINER_NAME_IS": "app",
  "THE_DOCKER_CONTAINER_NAME_IS": "ecs-fb-daemon-demo-1-app-c0d3dccbb0fdcd820400",
  "THE_DOCKER_ID_IS": "de7fbb1b66db297c51aff04e3ca90d2a9df690bb79cd5eadc9ccfa4bf02c6779",
  "THE_TASK_DEF_FAMILY_IS": "fb-daemon-demo",
  "THE_TASK_DEF_VERSION_IS": "1"
}

No leaks:

==24642== HEAP SUMMARY:
==24642==     in use at exit: 80 bytes in 1 blocks
==24642==   total heap usage: 5,803 allocs, 5,802 frees, 832,891 bytes allocated
==24642==
==24642== LEAK SUMMARY:
==24642==    definitely lost: 0 bytes in 0 blocks
==24642==    indirectly lost: 0 bytes in 0 blocks
==24642==      possibly lost: 0 bytes in 0 blocks
==24642==    still reachable: 80 bytes in 1 blocks
==24642==         suppressed: 0 bytes in 0 blocks
==24642== Rerun with --leak-check=full to see details of leaked memory
==24642==
==24642== For lists of detected and suppressed errors, rerun with: -s
==24642== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
SUCCESS: All unit tests have passed.

PettitWesley avatar Aug 16 '22 14:08 PettitWesley

Doc PR: https://github.com/fluent/fluent-bit-docs/pull/925

PettitWesley avatar Oct 10 '22 22:10 PettitWesley

[ec2-user@ip-10-192-11-106 build]$ valgrind ./bin/flb-rt-filter_ecs
==8030== Memcheck, a memory error detector
==8030== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==8030== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==8030== Command: ./bin/flb-rt-filter_ecs
==8030==
Test flb_test_ecs_filter...                     [2022/10/10 23:02:54] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8031
[2022/10/10 23:02:54] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:02:54] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:02:54] [ info] [sp] stream processor started
==8031== Warning: client switching stacks?  SP change: 0x910a778 --> 0x8437f20
==8031==          to suppress, use: --max-stackframe=13445208 or greater
==8031== Warning: client switching stacks?  SP change: 0x8437e98 --> 0x910a778
==8031==          to suppress, use: --max-stackframe=13445344 or greater
==8031== Warning: client switching stacks?  SP change: 0x910a998 --> 0x8437e98
==8031==          to suppress, use: --max-stackframe=13445888 or greater
==8031==          further instances of this message will not be shown.
[2022/10/10 23:02:56] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:02:57] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8031==
==8031== HEAP SUMMARY:
==8031==     in use at exit: 80 bytes in 1 blocks
==8031==   total heap usage: 5,817 allocs, 5,816 frees, 892,886 bytes allocated
==8031==
==8031== LEAK SUMMARY:
==8031==    definitely lost: 0 bytes in 0 blocks
==8031==    indirectly lost: 0 bytes in 0 blocks
==8031==      possibly lost: 0 bytes in 0 blocks
==8031==    still reachable: 80 bytes in 1 blocks
==8031==         suppressed: 0 bytes in 0 blocks
==8031== Rerun with --leak-check=full to see details of leaked memory
==8031==
==8031== For lists of detected and suppressed errors, rerun with: -s
==8031== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_no_prefix...           [2022/10/10 23:02:57] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8034
[2022/10/10 23:02:57] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:02:57] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:02:58] [ info] [sp] stream processor started
==8034== Warning: client switching stacks?  SP change: 0x910a778 --> 0x8437ec0
==8034==          to suppress, use: --max-stackframe=13445304 or greater
==8034== Warning: client switching stacks?  SP change: 0x8437e38 --> 0x910a778
==8034==          to suppress, use: --max-stackframe=13445440 or greater
==8034== Warning: client switching stacks?  SP change: 0x910a998 --> 0x8437e38
==8034==          to suppress, use: --max-stackframe=13445984 or greater
==8034==          further instances of this message will not be shown.
[2022/10/10 23:03:00] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:00] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8034==
==8034== HEAP SUMMARY:
==8034==     in use at exit: 80 bytes in 1 blocks
==8034==   total heap usage: 5,817 allocs, 5,816 frees, 892,808 bytes allocated
==8034==
==8034== LEAK SUMMARY:
==8034==    definitely lost: 0 bytes in 0 blocks
==8034==    indirectly lost: 0 bytes in 0 blocks
==8034==      possibly lost: 0 bytes in 0 blocks
==8034==    still reachable: 80 bytes in 1 blocks
==8034==         suppressed: 0 bytes in 0 blocks
==8034== Rerun with --leak-check=full to see details of leaked memory
==8034==
==8034== For lists of detected and suppressed errors, rerun with: -s
==8034== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_cluster_metadata_only... [2022/10/10 23:03:00] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8037
[2022/10/10 23:03:00] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:03:00] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:03:01] [ info] [sp] stream processor started
==8037== Warning: client switching stacks?  SP change: 0x910a778 --> 0x842cf50
==8037==          to suppress, use: --max-stackframe=13490216 or greater
==8037== Warning: client switching stacks?  SP change: 0x842cec8 --> 0x910a778
==8037==          to suppress, use: --max-stackframe=13490352 or greater
==8037== Warning: client switching stacks?  SP change: 0x910a998 --> 0x842cec8
==8037==          to suppress, use: --max-stackframe=13490896 or greater
==8037==          further instances of this message will not be shown.
[2022/10/10 23:03:03] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:03] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8037==
==8037== HEAP SUMMARY:
==8037==     in use at exit: 80 bytes in 1 blocks
==8037==   total heap usage: 5,800 allocs, 5,799 frees, 848,966 bytes allocated
==8037==
==8037== LEAK SUMMARY:
==8037==    definitely lost: 0 bytes in 0 blocks
==8037==    indirectly lost: 0 bytes in 0 blocks
==8037==      possibly lost: 0 bytes in 0 blocks
==8037==    still reachable: 80 bytes in 1 blocks
==8037==         suppressed: 0 bytes in 0 blocks
==8037== Rerun with --leak-check=full to see details of leaked memory
==8037==
==8037== For lists of detected and suppressed errors, rerun with: -s
==8037== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_cluster_error...       [2022/10/10 23:03:03] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8041
[2022/10/10 23:03:03] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:03:03] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:03:04] [ warn] [filter:ecs:ecs.0] Failed to get metadata from /v1/metadata, will retry
[2022/10/10 23:03:04] [ info] [sp] stream processor started
[2022/10/10 23:03:04] [ warn] [filter:ecs:ecs.0] Failed to get metadata from /v1/metadata, will retry
[2022/10/10 23:03:04] [error] [filter:ecs:ecs.0] Could not retrieve cluster metadata from ECS Agent
==8041== Warning: client switching stacks?  SP change: 0x910a778 --> 0x841dae0
==8041==          to suppress, use: --max-stackframe=13552792 or greater
==8041== Warning: client switching stacks?  SP change: 0x841da58 --> 0x910a778
==8041==          to suppress, use: --max-stackframe=13552928 or greater
==8041== Warning: client switching stacks?  SP change: 0x910a998 --> 0x841da58
==8041==          to suppress, use: --max-stackframe=13553472 or greater
==8041==          further instances of this message will not be shown.
[2022/10/10 23:03:06] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:06] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8041==
==8041== HEAP SUMMARY:
==8041==     in use at exit: 80 bytes in 1 blocks
==8041==   total heap usage: 5,770 allocs, 5,769 frees, 788,438 bytes allocated
==8041==
==8041== LEAK SUMMARY:
==8041==    definitely lost: 0 bytes in 0 blocks
==8041==    indirectly lost: 0 bytes in 0 blocks
==8041==      possibly lost: 0 bytes in 0 blocks
==8041==    still reachable: 80 bytes in 1 blocks
==8041==         suppressed: 0 bytes in 0 blocks
==8041== Rerun with --leak-check=full to see details of leaked memory
==8041==
==8041== For lists of detected and suppressed errors, rerun with: -s
==8041== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_task_error...          [2022/10/10 23:03:06] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8044
[2022/10/10 23:03:06] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:03:06] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:03:07] [ info] [sp] stream processor started
[2022/10/10 23:03:07] [ warn] [filter:ecs:ecs.0] Failed to get metadata from /v1/tasks?dockerid=79c796ed2a7f, will retry
[2022/10/10 23:03:07] [error] [filter:ecs:ecs.0] Requesting metadata from ECS Agent introspection endpoint failed
[2022/10/10 23:03:07] [error] [filter:ecs:ecs.0] Failed to get ECS Task metadata for 79c796ed2a7f, falling back to process cluster metadata only. If this is intentional, set `Cluster_Metadata_Only On`
==8044== Warning: client switching stacks?  SP change: 0x910a778 --> 0x8428f20
==8044==          to suppress, use: --max-stackframe=13506648 or greater
==8044== Warning: client switching stacks?  SP change: 0x8428e98 --> 0x910a778
==8044==          to suppress, use: --max-stackframe=13506784 or greater
==8044== Warning: client switching stacks?  SP change: 0x910a998 --> 0x8428e98
==8044==          to suppress, use: --max-stackframe=13507328 or greater
==8044==          further instances of this message will not be shown.
[2022/10/10 23:03:09] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:09] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8044==
==8044== HEAP SUMMARY:
==8044==     in use at exit: 80 bytes in 1 blocks
==8044==   total heap usage: 5,798 allocs, 5,797 frees, 832,641 bytes allocated
==8044==
==8044== LEAK SUMMARY:
==8044==    definitely lost: 0 bytes in 0 blocks
==8044==    indirectly lost: 0 bytes in 0 blocks
==8044==      possibly lost: 0 bytes in 0 blocks
==8044==    still reachable: 80 bytes in 1 blocks
==8044==         suppressed: 0 bytes in 0 blocks
==8044== Rerun with --leak-check=full to see details of leaked memory
==8044==
==8044== For lists of detected and suppressed errors, rerun with: -s
==8044== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
SUCCESS: All unit tests have passed.
==8030==
==8030== HEAP SUMMARY:
==8030==     in use at exit: 0 bytes in 0 blocks
==8030==   total heap usage: 6 allocs, 6 frees, 2,837 bytes allocated
==8030==
==8030== All heap blocks were freed -- no leaks are possible
==8030==
==8030== For lists of detected and suppressed errors, rerun with: -s
==8030== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

PettitWesley avatar Oct 10 '22 23:10 PettitWesley