containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[EKS] [request]: Add default toleration for Amazon CloudWatch Observability EKS add-on

Open fulcrum29 opened this issue 9 months ago • 10 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request Please add default toleration for Amazon CloudWatch Observability EKS add-on as it is for other add-ons so cloudwatch-agent and fluentbit will deploy on all nodes regardless of their taints:

tolerations: 
- operator: Exists

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Fluentbit and cloudwatch agent installed through Amazon CloudWatch Observability EKS add-on aren't deploying on all nodes due to node taints and lack of default tolerations. Are you currently working around this issue? No

Additional context None

Attachments https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html

fulcrum29 avatar Nov 09 '23 13:11 fulcrum29

Please also allow user to customize the configuration maps, such as fluent-bit-config

khacminh avatar Nov 10 '23 11:11 khacminh

While we wait for the feature, is there any best practice workaround that would allow us to run the daemonset on all nodes? I thought up editing the daemonset directly but then next time we do an add-on upgrade (using the EKS console), all changes would be lost, right?

jinzishuai avatar Jan 05 '24 00:01 jinzishuai

Any updated on this? It's affecting us a lot

fulcrum29 avatar Feb 08 '24 09:02 fulcrum29

Any update on the release of this feature ? I guess it has been already implemented on some of the add ons.

If there is a workaround for the best practise , please let us know :)

nirajdesai2909 avatar Feb 13 '24 07:02 nirajdesai2909

I too am affected by this. Please give us the ability to set nodeselector ( for the observability controller ) and tolerations for all the deployments/daemonsets in the cloudwatch observability add-on.

Currently I'd have to deploy it in a cluster which has untainted nodes and patch the tolerations/nodeselector in post-deployment.

example for those interested:

resource "null_resource" "patch_deployment" {
  triggers = {
    addon = var.addon_version
  }

  provisioner "local-exec" {
    command = "kubectl patch deployment amazon-cloudwatch-observability-controller-manager -n amazon-cloudwatch --patch-file ${local_file.patch.filename}"
  }

  depends_on = [local_file.patch, aws_eks_addon.cloudwatch_agent]
}

I set the following properties as well - in the hopes the modifications will stay:

  resolve_conflicts_on_create = "NONE"
  resolve_conflicts_on_update = "NONE"

local-exec is a dirty hack though :( It'd be great if the eks-addon could support this ( like the other ones )

fluckx-cs avatar Feb 14 '24 19:02 fluckx-cs

@fluckx-cs Thanks for the help. It is hacky but it might be the only workaround for now.

I do not understand , as AWS already has tolerations and other configs for other Add ons such as EBS, Proxy, VPC-CNI , why not add the same for these daemonsets for customer edit . Wonder why?

nirajdesai2909 avatar Feb 15 '24 16:02 nirajdesai2909

@nirajdesai2909 I am sure they have it on their backlog, but it was likely not part of the initial MVP. Bringing attention to it is the best way to move it up in the roadmap and get the feature out.

If it wasn't on the roadmap, getting attention to it to put it on the near-future roadmap is probably still the best way to get the feature out there :)

fluckx-cs avatar Feb 16 '24 08:02 fluckx-cs

This is also preventing us from rolling out the add-on in a no touch manner

cookiesowns avatar Apr 08 '24 20:04 cookiesowns

I'm also interested in this, but my situation is more that I want it to not try to run on Fargate nodes, which I had expected would be a no brainer

sidick avatar May 31 '24 10:05 sidick

Same here, we need a way to prevent it from running on Fargate nodes

jcputney avatar May 31 '24 13:05 jcputney

I think this was resolved in 1.7.0

Starting with version 1.7.0 of the Amazon CloudWatch Observability EKS add-on, the add-on and the Helm chart by default set Kubernetes tolerations to tolerate all taints on the pod workloads that are installed by the add-on or the Helm chart. This ensures that daemonsets such as the CloudWatch agent and Fluent Bit can schedule pods on all nodes in your cluster by default. For more information about tolerations and taints, see Taints and Tolerations in the Kubernetes documentation.

The default tolerations set by the add-on or the Helm chart are as follows:

tolerations:
- operator: Exists

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html

bartoszgridgg avatar Jul 17 '24 07:07 bartoszgridgg

I think this was resolved in 1.7.0

Starting with version 1.7.0 of the Amazon CloudWatch Observability EKS add-on, the add-on and the Helm chart by default set Kubernetes tolerations to tolerate all taints on the pod workloads that are installed by the add-on or the Helm chart. This ensures that daemonsets such as the CloudWatch agent and Fluent Bit can schedule pods on all nodes in your cluster by default. For more information about tolerations and taints, see Taints and Tolerations in the Kubernetes documentation.

The default tolerations set by the add-on or the Helm chart are as follows:

tolerations:
- operator: Exists

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html

It seems so ( double checked the configuration ). I'm still missing the node selectors ( for the controller pods ). Though I did add that in my comment and it isn't in the original request. So I'm not sure if it's actually in scope :).

I can create another issue for it otherwise.

fluckx-cs avatar Jul 17 '24 07:07 fluckx-cs

Closing this as completed. Open a separate request for other CW addon configuration param enhancements.

mikestef9 avatar Jul 17 '24 14:07 mikestef9