azure-vm-agents-plugin
azure-vm-agents-plugin copied to clipboard
Agents sometimes left as suspended after starting from shutdown state
When using idle retention strategy and setting shutting down only option, sometimes the agents may not be able to connect successfully. And it will stay as suspended all the time.
Some useful log from @Sander Bel ,
2019-05-09 09:20:05.878+0000 [id=1060] FINE hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Maintainer Pool Size 2019-05-09 09:20:05.879+0000 [id=1060] FINE hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Maintainer Pool Size. 1 ms 2019-05-09 09:22:05.560+0000 [id=1165] FINE hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Agents Clean Task 2019-05-09 09:22:05.560+0000 [id=1165] FINE c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: start 2019-05-09 09:22:05.561+0000 [id=1165] FINE c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: Running clean with 5 minute timeout 2019-05-09 09:22:05.561+0000 [id=1166] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanVMs: AzureVMAgentCleanUpTask: cleanVMs: node ubuntu-test3c8d10 blocked to cleanup 2019-05-09 09:22:05.561+0000 [id=1166] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Cleaning deployments 2019-05-09 09:22:05.561+0000 [id=1166] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Checking deployment ubuntu-test0509110640988 2019-05-09 09:22:07.340+0000 [id=1166] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment created on 5/9/19 11:08 AM 2019-05-09 09:22:07.340+0000 [id=1166] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment newer than timeout, keeping 2019-05-09 09:22:07.340+0000 [id=1166] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Done cleaning deployments 2019-05-09 09:22:07.439+0000 [id=1165] FINE c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: end 2019-05-09 09:22:07.439+0000 [id=1165] FINE hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Agents Clean Task. 1,879 ms 2019-05-09 09:22:54.407+0000 [id=104] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label mdb-sandbox-acs 2019-05-09 09:22:54.407+0000 [id=104] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:22:54.407+0000 [id=104] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu 2019-05-09 09:22:54.407+0000 [id=104] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest 2019-05-09 09:23:13.239+0000 [id=103] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label chef 2019-05-09 09:23:13.239+0000 [id=103] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:23:13.239+0000 [id=103] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: ubuntu-test matches! 2019-05-09 09:23:13.239+0000 [id=103] INFO c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: start for label chef workLoad 1 2019-05-09 09:23:13.240+0000 [id=103] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label chef 2019-05-09 09:23:13.240+0000 [id=103] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:23:13.240+0000 [id=103] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: ubuntu-test matches! 2019-05-09 09:23:13.240+0000 [id=103] INFO c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: checking for node reuse options 2019-05-09 09:23:13.240+0000 [id=103] INFO c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: agent computer eligible for reuse ubuntu-test3c8d10 2019-05-09 09:23:13.240+0000 [id=103] INFO c.m.a.v.AzureVMManagementServiceDelegate#virtualMachineExists: AzureVMManagementServiceDelegate: virtualMachineExists: check for ubuntu-test3c8d10 2019-05-09 09:23:14.168+0000 [id=103] INFO c.m.a.v.AzureVMManagementServiceDelegate#virtualMachineExists: AzureVMManagementServiceDelegate: virtualMachineExists: ubuntu-test3c8d10 exists 2019-05-09 09:23:14.169+0000 [id=103] INFO c.m.azure.vmagent.AzureVMCloud#provision: AzureVMCloud: provision: asynchronous provision finished, returning 1 planned node(s) 2019-05-09 09:23:14.169+0000 [id=930] INFO c.m.azure.vmagent.AzureVMCloud#lambda$provision$0: Found existing node, starting VM ubuntu-test3c8d10 2019-05-09 09:23:14.170+0000 [id=930] INFO c.m.a.v.AzureVMManagementServiceDelegate#startVirtualMachine: AzureVMManagementServiceDelegate: startVirtualMachine: ubuntu-test3c8d10 2019-05-09 09:24:03.370+0000 [id=930] INFO c.m.a.v.AzureVMManagementServiceDelegate#setVirtualMachineDetails: The Azure agent doesn't have a public IP. Will use the private IP 2019-05-09 09:24:03.370+0000 [id=930] INFO c.m.a.v.AzureVMManagementServiceDelegate#setVirtualMachineDetails: Azure agent details: nodeNameubuntu-test3c8d10 adminUserName=jenkins_node shutdownOnIdle=true retentionTimeInMin=0 labels=chef 2019-05-09 09:24:03.370+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: launch method called for agent ubuntu-test3c8d10 2019-05-09 09:24:05.920+0000 [id=1348] INFO c.m.a.v.AzureVMManagementServiceDelegate#isVMAliveOrHealthy: AzureVMManagementServiceDelegate: isVMAliveOrHealthy: status PowerState/running 2019-05-09 09:24:05.921+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#connectToSsh: AzureVMAgentSSHLauncher: connectToSsh: start 2019-05-09 09:24:05.921+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#getRemoteSession: AzureVMAgentSSHLauncher: getRemoteSession: getting remote session for user jenkins to host 10.76.96.77:22 2019-05-09 09:24:37.558+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#getRemoteSession: AzureVMAgentSSHLauncher: getRemoteSession: Got remote session for user jenkins to host 10.76.96.77:22 2019-05-09 09:24:37.559+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#connectToSsh: AzureVMAgentSSHLauncher: connectToSsh: Got remote connection 2019-05-09 09:24:37.559+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: starting test -e ~/.azure-agent-init 2019-05-09 09:24:38.811+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: executed successfully 2019-05-09 09:24:38.811+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: checking for java runtime 2019-05-09 09:24:38.811+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: starting java -fullversion 2019-05-09 09:24:40.152+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#executeRemoteCommand: AzureVMAgentSSHLauncher: executeRemoteCommand: executed successfully 2019-05-09 09:24:40.152+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: java runtime present, copying slave.jar to remote 2019-05-09 09:24:40.154+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#copyFileToRemote: AzureVMAgentSSHLauncher: copyFileToRemote: Initiating file transfer to slave.jar 2019-05-09 09:24:51.720+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#copyFileToRemote: AzureVMAgentSSHLauncher: copyFileToRemote: copied file Successfully to slave.jar 2019-05-09 09:24:51.721+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: launching agent: java -jar slave.jar 2019-05-09 09:24:51.742+0000 [id=1348] INFO c.m.a.v.r.AzureVMAgentSSHLauncher#launch: AzureVMAgentSSHLauncher: launch: Connected successfully 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label chef2 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: test-jenkins-ubuntu matches! 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label azure-acos-atlasco 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: acostest matches! 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label mdb-sandbox-acs 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu 2019-05-09 09:24:56.983+0000 [id=97] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest 2019-05-09 09:25:05.878+0000 [id=1474] FINE hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Maintainer Pool Size 2019-05-09 09:25:05.879+0000 [id=1474] FINE hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Maintainer Pool Size. 1 ms 2019-05-09 09:25:13.239+0000 [id=101] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Retrieving agent template with label mdb-sandbox-acs 2019-05-09 09:25:13.239+0000 [id=101] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template ubuntu-test 2019-05-09 09:25:13.239+0000 [id=101] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template test-jenkins-ubuntu 2019-05-09 09:25:13.239+0000 [id=101] FINE c.m.azure.vmagent.AzureVMCloud#getAzureAgentTemplate: AzureVMCloud: getAzureAgentTemplate: Found agent template acostest 2019-05-09 09:27:05.560+0000 [id=1812] FINE hudson.model.AsyncPeriodicWork$1#run: Started Azure VM Agents Clean Task 2019-05-09 09:27:05.561+0000 [id=1812] FINE c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: start 2019-05-09 09:27:05.561+0000 [id=1812] FINE c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: Running clean with 5 minute timeout 2019-05-09 09:27:05.561+0000 [id=1813] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Cleaning deployments 2019-05-09 09:27:05.561+0000 [id=1813] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Checking deployment ubuntu-test0509110640988 2019-05-09 09:27:07.711+0000 [id=1813] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment created on 5/9/19 11:08 AM 2019-05-09 09:27:07.711+0000 [id=1813] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Deployment newer than timeout, keeping 2019-05-09 09:27:07.712+0000 [id=1813] FINE c.m.a.v.AzureVMAgentCleanUpTask#cleanDeployments: AzureVMAgentCleanUpTask: cleanDeployments: Done cleaning deployments 2019-05-09 09:27:07.800+0000 [id=1812] FINE c.m.a.v.AzureVMAgentCleanUpTask#execute: AzureVMAgentCleanUpTask: execute: end 2019-05-09 09:27:07.800+0000 [id=1812] FINE hudson.model.AsyncPeriodicWork$1#run: Finished Azure VM Agents Clean Task. 2,240 ms 2019-05-09 09:28:46.078+0000 [id=101] INFO c.m.a.v.AzureVMCloudRetensionStrategy#check: AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: ubuntu-test3c8d10, action: shutdown 2019-05-09 09:28:46.079+0000 [id=1967] INFO c.m.a.v.AzureVMCloudRetensionStrategy$1#call: AzureVMCloudRetensionStrategy: going to idleTimeout
[JENKINS-57447] created by jieshe
This issue seems to be caused by https://github.com/jenkinsci/azure-vm-agents-plugin/blob/d83f4153627fc6f64c166217bac59e27876ee984/src/main/java/com/microsoft/azure/vmagent/remote/AzureVMAgentSSHLauncher.java#L250 . This method seems to be hanged forever and the below logs do not be triggered. Need more investigation on this because this method will call Jenkins core code.
by jieshe
Hi Jie, do you have an update?
by bel_sander
My assumption here seems to be wrong. This issue seems not be hung on the setChannel method. Any help or suggestion here will be welcome.
by jieshe
Jie Shen I have the issue you describe.
We have agents configured to shutdown after some idle time. When agent need to be reused, it come back online but never goes out suspended state. It appears from thread dump that it's blocked in https://github.com/jenkinsci/azure-vm-agents-plugin/blob/d83f4153627fc6f64c166217bac59e27876ee984/src/main/java/com/microsoft/azure/vmagent/remote/AzureVMAgentSSHLauncher.java#L250 calling https://github.com/jenkinsci/branch-api-plugin/blob/branch-api-2.5.4/src/main/java/jenkins/branch/WorkspaceLocatorImpl.java#L534:
Thread 31073: (state = BLOCKED) - jenkins.branch.WorkspaceLocatorImpl$Collector.onOnline(hudson.model.Computer, hudson.model.TaskListener) @bci=27, line=534 (Interpreted frame) - hudson.slaves.SlaveComputer.setChannel(hudson.remoting.Channel, java.io.OutputStream, hudson.remoting.Channel$Listener) @bci=662, line=698 (Interpreted frame) - hudson.slaves.SlaveComputer.setChannel(java.io.InputStream, java.io.OutputStream, java.io.OutputStream, hudson.remoting.Channel$Listener) @bci=82, line=432 (Interpreted frame) - com.microsoft.azure.vmagent.remote.AzureVMAgentSSHLauncher.launch(hudson.slaves.SlaveComputer, hudson.model.TaskListener) @bci=977, line=250 (Interpreted frame) - hudson.slaves.SlaveComputer$1.call() @bci=88, line=294 (Interpreted frame) - jenkins.util.ContextResettingExecutorService$2.call() @bci=18, line=46 (Compiled frame) - jenkins.security.ImpersonatingExecutorService$2.call() @bci=17, line=71 (Interpreted frame) - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Interpreted frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame)
by michaelburtin
I can't reproduce this