containers-roadmap
containers-roadmap copied to clipboard
[EKS] [request]: Add default toleration for Amazon CloudWatch Observability EKS add-on
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request Please add default toleration for Amazon CloudWatch Observability EKS add-on as it is for other add-ons so cloudwatch-agent and fluentbit will deploy on all nodes regardless of their taints:
tolerations:
- operator: Exists
Which service(s) is this request for? EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Fluentbit and cloudwatch agent installed through Amazon CloudWatch Observability EKS add-on aren't deploying on all nodes due to node taints and lack of default tolerations. Are you currently working around this issue? No
Additional context None
Please also allow user to customize the configuration maps, such as fluent-bit-config
While we wait for the feature, is there any best practice workaround that would allow us to run the daemonset on all nodes? I thought up editing the daemonset directly but then next time we do an add-on upgrade (using the EKS console), all changes would be lost, right?
Any updated on this? It's affecting us a lot
Any update on the release of this feature ? I guess it has been already implemented on some of the add ons.
If there is a workaround for the best practise , please let us know :)
I too am affected by this. Please give us the ability to set nodeselector ( for the observability controller ) and tolerations for all the deployments/daemonsets in the cloudwatch observability add-on.
Currently I'd have to deploy it in a cluster which has untainted nodes and patch the tolerations/nodeselector in post-deployment.
example for those interested:
resource "null_resource" "patch_deployment" {
triggers = {
addon = var.addon_version
}
provisioner "local-exec" {
command = "kubectl patch deployment amazon-cloudwatch-observability-controller-manager -n amazon-cloudwatch --patch-file ${local_file.patch.filename}"
}
depends_on = [local_file.patch, aws_eks_addon.cloudwatch_agent]
}
I set the following properties as well - in the hopes the modifications will stay:
resolve_conflicts_on_create = "NONE"
resolve_conflicts_on_update = "NONE"
local-exec is a dirty hack though :( It'd be great if the eks-addon could support this ( like the other ones )
@fluckx-cs Thanks for the help. It is hacky but it might be the only workaround for now.
I do not understand , as AWS already has tolerations and other configs for other Add ons such as EBS, Proxy, VPC-CNI , why not add the same for these daemonsets for customer edit . Wonder why?
@nirajdesai2909 I am sure they have it on their backlog, but it was likely not part of the initial MVP. Bringing attention to it is the best way to move it up in the roadmap and get the feature out.
If it wasn't on the roadmap, getting attention to it to put it on the near-future roadmap is probably still the best way to get the feature out there :)
This is also preventing us from rolling out the add-on in a no touch manner
I'm also interested in this, but my situation is more that I want it to not try to run on Fargate nodes, which I had expected would be a no brainer
Same here, we need a way to prevent it from running on Fargate nodes
I think this was resolved in 1.7.0
Starting with version 1.7.0 of the Amazon CloudWatch Observability EKS add-on, the add-on and the Helm chart by default set Kubernetes tolerations to tolerate all taints on the pod workloads that are installed by the add-on or the Helm chart. This ensures that daemonsets such as the CloudWatch agent and Fluent Bit can schedule pods on all nodes in your cluster by default. For more information about tolerations and taints, see Taints and Tolerations in the Kubernetes documentation.
The default tolerations set by the add-on or the Helm chart are as follows:
tolerations:
- operator: Exists
I think this was resolved in 1.7.0
Starting with version 1.7.0 of the Amazon CloudWatch Observability EKS add-on, the add-on and the Helm chart by default set Kubernetes tolerations to tolerate all taints on the pod workloads that are installed by the add-on or the Helm chart. This ensures that daemonsets such as the CloudWatch agent and Fluent Bit can schedule pods on all nodes in your cluster by default. For more information about tolerations and taints, see Taints and Tolerations in the Kubernetes documentation.
The default tolerations set by the add-on or the Helm chart are as follows:
tolerations: - operator: Exists
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html
It seems so ( double checked the configuration ). I'm still missing the node selectors ( for the controller pods ). Though I did add that in my comment and it isn't in the original request. So I'm not sure if it's actually in scope :).
I can create another issue for it otherwise.
Closing this as completed. Open a separate request for other CW addon configuration param enhancements.