wave icon indicating copy to clipboard operation
wave copied to clipboard

[feature] Add option to set toleration for tainted nodes

Open bebosudo opened this issue 2 years ago • 8 comments

There are some node groups in the prod cluster to which wave can start pods to build its images, but currently any pod may end up running on those nodes. One problem with this is that pods may be scheduled on an ARM node and crash due to architecture incompatibility.

We'd prefer to taint the nodes so only pods running with an explicit toleration would be scheduled on those pods. @pditommaso mentioned that wave may not have this capability for now, so I'm opening a feature request for this.

bebosudo avatar Sep 06 '23 13:09 bebosudo

How would the toleration should be specified? in terms of K9s spec I mean?

pditommaso avatar Sep 07 '23 15:09 pditommaso

How would the toleration should be specified? in terms of K9s spec I mean?

If you mean how does a pod submission with a toleration work in kubernetes here's an example for the gpu-compute nodegroup in tower-dev: https://github.com/seqeralabs/infrastructure/pull/166#issuecomment-1683636834

bebosudo avatar Sep 08 '23 07:09 bebosudo

@bebosudo we have a node selector in the Wave How is toleration different? https://github.com/seqeralabs/wave/blob/0ab351f26a9b4dd680c0bc84c656d7988ab4b29e/src/main/groovy/io/seqera/wave/service/k8s/K8sServiceImpl.groovy#L370

munishchouhan avatar Dec 26 '23 17:12 munishchouhan

@bebosudo please see if Munish' comment above addresses your point

marcodelapierre avatar Jan 08 '24 04:01 marcodelapierre

Sorry, I missed the notification of Munish' comment in my inbox.

NodeSelector is a pod-side feature, explaining k8s how to allocate a pod to a specific set of nodes. NodeSelector doesn't prevent other pods to be scheduled on the same nodes, which is what we want to avoid: currently there are two sets of nodes where wave should build containers, one for x86_64 and one for arm64. We had to set taints on them because k8s was seeing them as unused servers and was scheduling pods on them, sometimes causing problems when x86_64 containers tried running on arm64 nodes.

This is the reason why taints are useful in combination with tolerations: with taints you can mark a node with certain flags, and only pods with tolerations matching those flags will be allowed to run on the tainted nodes. This is the reason for requesting to add support for tolerations in wave.

bebosudo avatar Jan 08 '24 10:01 bebosudo

Similar to tolerations are node affinity rules: pods may want to select a specific node to run on depending on certain features defined in labels. Node affinity is the evolution of nodeSelector: it can select a node using a label and do much more. We use it to force pods to run in the same AZ where we know an EBS volume is located, which otherwise would cause the pod to be stuck in a volume constraint error.

Node affinity can't replace taints and tolerations though, they are two independent (but similar) features.

Considering the plan to allow customers to use wave on-prem, I consider these features important and they will probably be requested by customers sooner rather than later.

bebosudo avatar Jan 08 '24 10:01 bebosudo

@bebosudo Thanks for the explanation

munishchouhan avatar Jan 08 '24 11:01 munishchouhan

I have created PR to add a toleration configuration (key, value, operator and effect ) for ARM build and scan pods

munishchouhan avatar Jan 08 '24 15:01 munishchouhan

Not planned

pditommaso avatar Aug 18 '24 08:08 pditommaso