amazon-cloudwatch-logs-for-fluent-bit icon indicating copy to clipboard operation
amazon-cloudwatch-logs-for-fluent-bit copied to clipboard

enhance fluentbit process logging for plugin

Open nwsparks opened this issue 2 years ago • 8 comments

when templating fails, it's very difficult to identify the source. the logs are flooded with this

time="2022-05-13T14:33:22Z" level=error msg="[cloudwatch 0] parsing log_group_name template '/eks/eks/$(kubernetes['namespace_name'])/$(kubernetes['labels']['k8s-app'])' (using value of default_log_group_name instead): k8s-app: sub-tag name not found"

but there is no way to identify where it is coming from.

nwsparks avatar May 13 '22 14:05 nwsparks

@nwsparks could you please explain a little bit about what you want to improve the logging?

zhonghui12 avatar May 20 '22 00:05 zhonghui12

@zhonghui12 if the error log I posted from when templating fails contained the name of the pod that was causing this it would help in identifying the source of the error generation.

These error logs generate a TON of volume and in a system with many deployments it is very difficult to track down the source.

nwsparks avatar May 20 '22 11:05 nwsparks

edited the subject to make it more clear that this is for the fluentbit process logs

nwsparks avatar May 20 '22 13:05 nwsparks

Facing a similar issue. In one hour that has generated 53 million records with the same error "sub-tag name not found":

Screen Shot 2022-08-09 at 4 11 35 pm

btw the day on the screenshot was the day we deployed the component to the cluster. We disabled it due to the Cloudwatch high ingestion costs.

Cluster details:

eks version 1.22 helm chart repo: https://aws.github.io/eks-charts helm chart release_name: aws-for-fluent-bit helm chart version: 0.1.18

jeremiasroma avatar Aug 09 '22 06:08 jeremiasroma

Getting the same issue. I checked the logs that were forwarded to the default log group in CloudWatch, and saw kubernetes['labels'] clearly contains the sub-tag.

Did you ever fixed this?

We used to run https://github.com/DNXLabs/terraform-aws-eks-cloudwatch-logs

That uses the following config:

  set {
    name  = "cloudWatch.logGroupName"
    value = "/aws/eks/${var.cluster_name}/$(kubernetes['labels']['app'])"
  }

But that repo is no longer maintained so I just got the Helm chart https://artifacthub.io/packages/helm/aws/aws-for-fluent-bit and set the config value listed above. But now I got the the following errors:

time="2023-03-21T09:49:24Z" level=error msg="[cloudwatch 0] parsing log_group_name template '/aws/eks/staging/$(kubernetes['labels']['app'])' (using value of default_log_group_name instead): app: sub-tag name not found"

Mattie112 avatar Mar 21 '23 09:03 Mattie112

@Mattie112 If you're still using the DNXLabs module, please update it to version 0.1.5 One of the changes is that instead of using a label "app" which may not exist, I've set "app.kubernetes.io/name" as it's one of the default k8s labels.

According to the plugin's doco, they've released a new version of the Cloudwatch plugin which brings better performance and other improvements.

The main issue we've seen with the old cloudwatch plugin is that it couldn't handle logs if a label did not exist in a pod definition. The new cloudwatchlogs plugin uses a logGroupTemplate rather than a fixed logroupname, if that template does not exit, it shuffles back to the default logroup "/aws/eks/fluentbit-cloudwatch/logs".

Thanks

jeremiasroma avatar Jun 08 '23 03:06 jeremiasroma

Thanks! I have switched to https://github.com/aws/aws-for-fluent-bit I prefer to have something that is maintained :) I don't really see the added benefit of this chart other than saving a few lines of code.

And indeed I am now using the namespace_name instead of the app label. I would still prefer the label but hey the namespace is for 99% of our cases fine :)

Mattie112 avatar Jun 08 '23 07:06 Mattie112