[EKS]: amazon-cloudwatch-observability addon having issue when passing fluentBit config using configuration_values
I am creating EKS CloudWatch Observability addon using Terraform with this config. All custom configurations are available in configs directory.
When deploying it, it goes timed-out after 20 minutes
While exploring the logs of fluentbit pods, it shows file is already available and contains error. If I delete default config and deploy same configurations using terraform_data source, it deploys and works perfectly fine.
@tabern @maishsk @jlbutler Any fix for this issue. CustomFluentBit config is not working properly. Looking forward to it. Thanks
Would you mind providing some more additional details as to what your exact file structure looks like and the specific terraform commands you are running so we can replicate the issue?
Sure @sky333999 here is the resource
resource "aws_eks_addon" "amazon_cloudwatch_observability" {
cluster_name = "eks-test-cluster"
addon_name = "amazon-cloudwatch-observability"
addon_version = null # For latest version
configuration_values = jsonencode({
"containerLogs" = {
"enabled" = true
"fluentBit" = {
"config" = {
"service" = file("${path.module}/configs/fluent-bit.conf"),
"customParsers" = file("${path.module}/configs/parsers.conf"),
"extraFiles" = {
"application-log.conf" = file("${path.module}/configs/application-log.conf"),
"dataplane-log.conf" = file("${path.module}/configs/dataplane-log.conf"),
"host-log.conf" = file("${path.module}/configs/host-log.conf")
}
}
}
}
})
}
For application-log.conf, here is the content
[INPUT]
Name tail
Tag application.*
Exclude_Path /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
Path /var/log/containers/*.log
Docker_Mode On
Docker_Mode_Flush 5
Docker_Mode_Parser container_firstline
Parser console
DB /var/fluent-bit/state/flb_container.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
Rotate_Wait 30
storage.type filesystem
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag application.*
Path /var/log/containers/fluent-bit*
Parser docker
DB /var/fluent-bit/state/flb_log.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[INPUT]
Name tail
Tag application.*
Path /var/log/containers/cloudwatch-agent*
Docker_Mode On
Docker_Mode_Flush 5
Docker_Mode_Parser cwagent_firstline
Parser docker
DB /var/fluent-bit/state/flb_cwagent.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}
[FILTER]
Name kubernetes
Match application.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix application.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Labels Off
Annotations Off
Use_Kubelet On
Kubelet_Port 10250
Buffer_Size 0
[OUTPUT]
Name cloudwatch_logs
Match application.*
region ${AWS_REGION}
log_group_name /aws/containerinsights/${CLUSTER_NAME}/application
log_stream_prefix ${HOST_NAME}-
auto_create_group true
extra_user_agent container-insights
Other config are default ones. You can test with terraform apply command. Thanks
To anyone running into this issue:
It's caused by a faulty Fluent Bit configuration.
If you, like me, copied the config from a running default deployment of the CloudWatch addon, you might have also copied the @INCLUDE lines from fluent-bit.conf. When you then deploy this with the method shown above it results in a configuration that ends up looking like this:
[SERVICE]
Flush 5
Grace 30
Log_Level error
Daemon off
Parsers_File parsers.conf
storage.path /var/fluent-bit/state/flb-storage/
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 5M
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf
@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf
Including the files twice.
These @INCLUDE statements should not be added to the config file in your repo. They are automatically added on deployment. Once I removed those lines, the configuration became valid, and It deployed successfully with Terraform.
Hope this helps someone!