kubernetes-plugin
kubernetes-plugin copied to clipboard
[JENKINS-67679] add maintenance mode to pause provisioning nodes
fix JENKINS-67679 add maintenance mode to pause provisioning nodes
- [x] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
- [x] Ensure that the pull request title represents the desired changelog entry
- [x] Please describe what you did
- [x] Link to relevant issues in GitHub or Jira
- [x] Link to relevant pull requests, esp. upstream and downstream changes
- [x] Ensure you have provided tests - that demonstrates feature works or fixes the issue
Is this really necessary? Can you not get a similar effect by putting some sort of taint on the cluster’s nodes, or making an admission webhook reject new pods? Or would builds fail with an error after timing out trying to schedule a pod?
If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud
offline.
If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud offline.
This looks like a good idea actually.
Is this really necessary? Can you not get a similar effect by putting some sort of taint on the cluster’s nodes, or making an admission webhook reject new pods? Or would builds fail with an error after timing out trying to schedule a pod?
The permissions that a Jenkins admin are given are not always the same as the set of permissions of the Kubernetes cluster (even not permitted to access the cluster API). This PR is really legit in term of usage.
If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud offline.
That is really an excellent idea: this change would benefit way more users. Where should it be looked (a particular plugin? or the Jenkins core)?
Is this really necessary? Can you not get a similar effect by putting some sort of taint on the cluster’s nodes, or making an admission webhook reject new pods? Or would builds fail with an error after timing out trying to schedule a pod?
If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a
Cloud
offline.
maintenance usually includes cordon and drain, cordon will put a taint on node, drain will evict pods to other nodes (drain is necessary for kernel upgrade, docker/containerd uprade etc)
kubectl drain node-name--delete-local-data --ignore-daemonsets
Jenkins job pod will be deleted from the node and job will fail. Even if we enable retry on job level, plugin may launch a pod on other nodes which will also be drained. You may ask why don't put all node on cordon, that is overkill since kubernetes service are not affected by cordon and drain, kubernetes has logic to maintain PodDisruptionBudget in drain operation. Jenkins job is the only victim, we can not freeze whole cluster for jenkins if the cluster is shared.
I tried to workaround the issue by using a system groovy job to put k8s plugin on hold
import jenkins.model.Jenkins
import org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud;
def instance=Jenkins.get()
// we only care first item since we know only one defined
def clouds=instance.clouds.getAll(KubernetesCloud.class).findAll{it -> filter-clause }
clouds.each{ cloud ->
cloud.containerCap=-1
}
But if jenkins itself is running in k8s cluster and gets restarted, the containerCap will lost due to check (it is a valid check)
public void setContainerCap(Integer containerCap) {
this.containerCap = (containerCap != null && containerCap > 0) ? containerCap : null;
}
public String getContainerCapStr() {
// null, serialized Integer.MAX_VALUE, or 0 means no limit
return (containerCap == null || containerCap == Integer.MAX_VALUE || containerCap == 0) ? "" : String.valueOf(containerCap);
}
so I think it is better to add a self-explaining flag
Jenkins job pod will be deleted from the node and job will fail.
FYI: #1083
Hi,
I'm looking a similar feature to put a cloud in maintenance mode.
During my search I found the agent-maintenance-plugin
but it cover partially my use case of permanent jenkins agents.
Regards,
Taking another look at this request: I think the agent-maintenance-plugin
could be extended to support clouds. There is an existing CloudProvisioningListener
extension point allowing to prevent cloud provisioning on a specific cloud and label (hudson.slaves.CloudProvisioningListener#canProvision(hudson.slaves.Cloud, hudson.slaves.Cloud.CloudState, int)
). This would cover the use case for any kind of cloud, as well as providing a central place to configure such maintenance windows.