configuration-as-code-plugin icon indicating copy to clipboard operation
configuration-as-code-plugin copied to clipboard

Manual Changes in UI Break Worker Connectivity

Open thinkjk opened this issue 2 years ago • 0 comments

Jenkins and plugins versions report

Environment
Jenkins: 2.319.3
OS: Linux - 5.10.109+
---
configuration-as-code:1414.v878271fc496f

What Operating System are you using (both controller, and any agents involved in the problem)?

GCP GKE Containers

Reproduction steps

  1. Make any change to the pod template by changing sleep from 999999 to 99999 under configure cloud (or any single other change)
  2. Run any Jenkins job

Expected Results

Expect the existing previously succeed job to succeed again

Actual Results

The jobs fails because the controller can't connect to the agent:

Error in provisioning; agent=KubernetesSlave name: jenkins-worker-2xht0, template=PodTemplate{id='a75f9c48-9d47-43a5-ab48-7e088a5be444', name='jenkins-worker', namespace='jenkins', slaveConnectTimeout=100, label='jenkins-worker', serviceAccount='service-account', containers=[ContainerTemplate{name='some-container', image='some-image', workingDir='/home/jenkins', command='sleep', args='99999', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceRequestEphemeralStorage='', resourceLimitCpu='', resourceLimitMemory='', resourceLimitEphemeralStorage='', envVars=[KeyValueEnvVar [getValue()=tcp://localhost:2375, getKey()=DOCKER_HOST]], livenessProbe=ContainerLivenessProbe{execArgs='', timeoutSeconds=0, initialDelaySeconds=0, failureThreshold=0, periodSeconds=0, successThreshold=0}}, ContainerTemplate{name='dind-daemon', image='docker:18.06.3-ce-dind', privileged=true, workingDir='/home/jenkins', command='sleep', args='9999999', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceRequestEphemeralStorage='', resourceLimitCpu='', resourceLimitMemory='', resourceLimitEphemeralStorage='', livenessProbe=ContainerLivenessProbe{execArgs='', timeoutSeconds=0, initialDelaySeconds=0, failureThreshold=0, periodSeconds=0, successThreshold=0}}]}
java.lang.IllegalStateException: Agent is not connected after 100 seconds, status: Running
	at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:244)
	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:293)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Anything else?

Any time we make any single manual changes in the UI in relation to the cloud configuration, it breaks the connectivity between the worker and the controller, and we fail with the error above. If we reload the existing configuration in configuration-as-code the job works again. If we put the exact same changes into code and load the configuration, the job succeeds.

Is this intended with the plugin?

Thank you.

thinkjk avatar Jul 05 '22 23:07 thinkjk