containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS]: amazon-cloudwatch-observability addon having issue when passing fluentBit config using configuration_values

Open alihamza-official opened this issue 1 year ago • 4 comments

I am creating EKS CloudWatch Observability addon using Terraform with this config. All custom configurations are available in configs directory.

image

When deploying it, it goes timed-out after 20 minutes

While exploring the logs of fluentbit pods, it shows file is already available and contains error. If I delete default config and deploy same configurations using terraform_data source, it deploys and works perfectly fine.

image

alihamza-official avatar Nov 12 '24 16:11 alihamza-official

@tabern @maishsk @jlbutler Any fix for this issue. CustomFluentBit config is not working properly. Looking forward to it. Thanks

alihamza-official avatar Nov 27 '24 16:11 alihamza-official

Would you mind providing some more additional details as to what your exact file structure looks like and the specific terraform commands you are running so we can replicate the issue?

sky333999 avatar Jan 08 '25 06:01 sky333999

Sure @sky333999 here is the resource

resource "aws_eks_addon" "amazon_cloudwatch_observability" {
  cluster_name                = "eks-test-cluster"
  addon_name                  = "amazon-cloudwatch-observability"
  addon_version               = null # For latest version

  configuration_values = jsonencode({
    "containerLogs" = {
      "enabled" = true
      "fluentBit" = {
        "config" = {
          "service" = file("${path.module}/configs/fluent-bit.conf"),
          "customParsers" = file("${path.module}/configs/parsers.conf"),
          "extraFiles" = {
            "application-log.conf" = file("${path.module}/configs/application-log.conf"),
            "dataplane-log.conf" = file("${path.module}/configs/dataplane-log.conf"),
            "host-log.conf" = file("${path.module}/configs/host-log.conf")
          }
        }
      }
    }
  })
}

For application-log.conf, here is the content

[INPUT]
    Name                tail
    Tag                 application.*
    Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
    Path                /var/log/containers/*.log
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  container_firstline
    Parser              console
    DB                  /var/fluent-bit/state/flb_container.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Rotate_Wait         30
    storage.type        filesystem
    Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 application.*
    Path                /var/log/containers/fluent-bit*
    Parser              docker
    DB                  /var/fluent-bit/state/flb_log.db
    Mem_Buf_Limit       5MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 application.*
    Path                /var/log/containers/cloudwatch-agent*
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  cwagent_firstline
    Parser              docker
    DB                  /var/fluent-bit/state/flb_cwagent.db
    Mem_Buf_Limit       5MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
    Name                kubernetes
    Match               application.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_Tag_Prefix     application.var.log.containers.
    Merge_Log           On
    Merge_Log_Key       log_processed
    K8S-Logging.Parser  On
    K8S-Logging.Exclude Off
    Labels              Off
    Annotations         Off
    Use_Kubelet         On
    Kubelet_Port        10250
    Buffer_Size         0

[OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   ${HOST_NAME}-
    auto_create_group   true
    extra_user_agent    container-insights

Other config are default ones. You can test with terraform apply command. Thanks

alihamza-official avatar Jan 08 '25 09:01 alihamza-official

To anyone running into this issue:

It's caused by a faulty Fluent Bit configuration.

If you, like me, copied the config from a running default deployment of the CloudWatch addon, you might have also copied the @INCLUDE lines from fluent-bit.conf. When you then deploy this with the method shown above it results in a configuration that ends up looking like this:

[SERVICE]
  Flush                     5
  Grace                     30
  Log_Level                 error
  Daemon                    off
  Parsers_File              parsers.conf
  storage.path              /var/fluent-bit/state/flb-storage/
  storage.sync              normal
  storage.checksum          off
  storage.backlog.mem_limit 5M
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf

Including the files twice.

These @INCLUDE statements should not be added to the config file in your repo. They are automatically added on deployment. Once I removed those lines, the configuration became valid, and It deployed successfully with Terraform.

Hope this helps someone!

thelil93 avatar Apr 29 '25 11:04 thelil93