azure-vm-agents-plugin icon indicating copy to clipboard operation
azure-vm-agents-plugin copied to clipboard

Agent VM is not reused after Jenkins controller restart

Open idontsov opened this issue 4 months ago • 2 comments

Jenkins and plugins versions report

Environment
Jenkins: 2.516.1
OS: Linux - 5.10.0-35-cloud-amd64
Java: 17.0.15 - Debian (OpenJDK 64-Bit Server VM)
---
Office-365-Connector:5.1.0
antisamy-markup-formatter:173.v680e3a_b_69ff3
apache-httpcomponents-client-4-api:4.5.14-269.vfa_2321039a_83
apache-httpcomponents-client-5-api:5.5-150.veb_76e719855b_
asm-api:9.8-163.vb_2a_96d3f9c3c
authentication-tokens:1.144.v5ff4a_5ec5c33
azure-ad:580.v2f665882b_a_71
azure-credentials:357.v6447d38fb_007
azure-sdk:239.v0e088b_133a_77
azure-vm-agents:1043.v6c34a_871003d
blueocean:1.27.21
blueocean-bitbucket-pipeline:1.27.21
blueocean-commons:1.27.21
blueocean-config:1.27.21
blueocean-core-js:1.27.21
blueocean-dashboard:1.27.21
blueocean-display-url:2.4.4
blueocean-events:1.27.21
blueocean-git-pipeline:1.27.21
blueocean-github-pipeline:1.27.21
blueocean-i18n:1.27.21
blueocean-jwt:1.27.21
blueocean-personalization:1.27.21
blueocean-pipeline-api-impl:1.27.21
blueocean-pipeline-editor:1.27.21
blueocean-pipeline-scm-api:1.27.21
blueocean-rest:1.27.21
blueocean-rest-impl:1.27.21
blueocean-web:1.27.21
bootstrap5-api:5.3.7-2
bouncycastle-api:2.30.1.81-264.v95c79c0e772c
branch-api:2.1235.v04e86c7ce54c
build-timeout:1.38
caffeine-api:3.2.2-178.v353b_8428ed56
checks-api:373.vfe7645102093
cloud-stats:377.vd8a_6c953e98e
cloudbees-bitbucket-branch-source:936.4.4
cloudbees-folder:6.1037.v4cb_8573b_72a_a_
commons-httpclient3-api:3.1-3
commons-lang3-api:3.18.0-98.v3a_674c06072d
commons-text-api:1.14.0-194.v804a_dc3a_1b_d8
credentials:1419.v2337d1ceceef
credentials-binding:702.vfe613e537e88
display-url-api:2.217.va_6b_de84cc74b_
durable-task:595.ve87b_f1318d67
echarts-api:6.0.0-1
eddsa-api:0.3.0.1-19.vc432d923e5ee
favorite:2.237.v79163ca_8b_892
font-awesome-api:7.0.0-1
git:5.7.0
git-client:6.3.0
git-parameter:444.vca_b_84d3703c2
github:1.44.0
github-api:1.321-488.v9b_c0da_9533f8
github-branch-source:1834.v857721ea_74c6
gson-api:2.13.1-153.vb_3d0c48a_a_b_4a_
handy-uri-templates-2-api:2.1.8-36.v85e4cb_234a_13
htmlpublisher:427
instance-identity:203.v15e81a_1b_7a_38
ionicons-api:94.vcc3065403257
jackson2-api:2.19.2-408.v18248a_324cfe
jakarta-activation-api:2.1.3-2
jakarta-mail-api:2.1.3-2
javax-activation-api:1.2.0-8
javax-mail-api:1.6.2-11
jaxb:2.3.9-133.vb_ec76a_73f706
jenkins-design-language:1.27.21
jjwt-api:0.11.5-120.v0268cf544b_89
jobConfigHistory:1343.v4b_e819a_ecdc2
joda-time-api:2.14.0-149.v1c3ce991d1b_9
jquery3-api:3.7.1-3
jsch:0.2.16-95.v3eecb_55fa_b_78
json-api:20250517-163.v1c5da_e99c775
json-path-api:2.9.0-178.vca_b_c71881321
junit:1335.v6b_a_a_e18534e1
mailer:515.vd788654779b_1
matrix-auth:3.2.6
matrix-project:849.v0cd64ed7e531
mina-sshd-api-common:2.15.0-161.vb_200831a_c15b_
mina-sshd-api-core:2.15.0-161.vb_200831a_c15b_
netty-api:4.1.118.Final-9.v776038d601a_7
nunit:593.v76f7a_5f959c1
okhttp-api:4.11.0-189.v976fa_d3379d6
pipeline-build-step:571.v08a_fffd4b_0ce
pipeline-graph-analysis:241.vc3d48fb_b_2582
pipeline-groovy-lib:752.vdddedf804e72
pipeline-input-step:532.v9e7466cb_4406
pipeline-milestone-step:138.v78ca_76831a_43
pipeline-model-api:2.2258.v4e96d2b_da_f9b_
pipeline-model-definition:2.2258.v4e96d2b_da_f9b_
pipeline-model-extensions:2.2258.v4e96d2b_da_f9b_
pipeline-rest-api:2.38
pipeline-stage-step:322.vecffa_99f371c
pipeline-stage-tags-metadata:2.2258.v4e96d2b_da_f9b_
pipeline-stage-view:2.38
plain-credentials:199.v9f8e1f741799
plugin-util-api:6.1.0
popper2-api:2.11.6-5
pubsub-light:1.19
role-strategy:799.v5b_e7b_ecc231e
scm-api:707.v749f968369d4
script-security:1378.vf25626395f49
snakeyaml-api:2.3-125.v4d77857a_b_402
sse-gateway:1.28
ssh-credentials:361.vb_f6760818e8c
sshd:3.372.v5d04a_e92d8cf
structs:353.v261ea_40a_80fb_
testcafe:1.0
timestamper:1.30
token-macro:477.vd4f0dc3cb_cf1
trilead-api:2.209.v0e69b_c43c245
variant:70.va_d9f17f859e0
workflow-aggregator:608.v67378e9d3db_1
workflow-api:1382.veca_a_efe062fa_
workflow-basic-steps:1079.vce64b_a_929c5a_
workflow-cps:4175.ve65b_fa_663eed
workflow-durable-task-step:1446.v3efd13441220
workflow-job:1540.v295eccc9778f
workflow-multibranch:810.v6b_6e77da_7058
workflow-scm-step:437.v05a_f66b_e5ef8
workflow-step-api:706.v518c5dcb_24c0
workflow-support:976.vb_d9493c2eb_09

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller: Debian Bullseye

Reproduction steps

We use non-ephemeral Azure VM agents (ShutdownOnIdle = true).

  1. Start a job that requires an agent node.
  2. Jenkins deploys an agent VM to Azure and executes the job.
  3. Jenkins shuts down the agent VM after the idle timeout.
  4. Restart the Jenkins controller.
  5. Start the job again.

Expected Results

Jenkins should find and start the existing VM:

Jenkins is fully up and running
Aug 07, 2025 5:38:49 PM INFO com.microsoft.azure.vmagent.remote.AzureVMAgentSSHLauncher launch
Agent builder-frontend100970 is shut down, deleted, etc. Not attempting to connect
Aug 07, 2025 5:42:51 PM INFO com.microsoft.azure.vmagent.AzureVMManagementServiceDelegate virtualMachineExists
Checking VM exists for builder-frontend100970
Aug 07, 2025 5:42:52 PM INFO com.microsoft.azure.vmagent.AzureVMCloud provision
1 planned node(s)
Aug 07, 2025 5:42:52 PM INFO org.jenkinsci.plugins.cloudstats.CloudStatistics logTypeNotSupported
No support for cloud-stats plugin by class hudson.slaves.NodeProvisioner$PlannedNode
Aug 07, 2025 5:42:52 PM INFO com.microsoft.azure.vmagent.AzureVMCloud lambda$provision$1
Found existing node, starting VM builder-frontend100970
Aug 07, 2025 5:42:52 PM INFO com.microsoft.azure.vmagent.AzureVMManagementServiceDelegate startVirtualMachine
Starting: builder-frontend100970

Actual Results

Jenkins does not attempt to start the existing VM; instead, it tries to deploy a new one:

Agent builder-frontendbb1020 is shut down, deleted, etc. Not attempting to connect
Aug 07, 2025 4:05:37 PM INFO com.microsoft.azure.vmagent.AzureVMCloud calculateNumberOfAgentsToRequest
Wanted to create 1 nodes from template builder-frontend but cannot create any, have template limit of 1 but have 1 VMs already so we can have 0 more, currently have 1 VMs in cloud

Anything else?

I found that some agent node properties are lost after the controller restarts. In particular:

  • eligibleForReuse is reset from 'true' to 'false'
  • cleanUpAction is reset from 'SHUTDOWN' to 'BLOCK'

Note: The cleanUpAction property must be set to 'SHUTDOWN'; otherwise, based on my observations, the Jenkins controller will not stop the agent VM after the idle timeout.

Workaround

To work around the issue, I run the following script:

import jenkins.model.Jenkins
for (node in Jenkins.instance.nodes) {
    println node.name
    println "  eligibleForReuse: ${node.eligibleForReuse}"
    println "  cleanUpAction: ${node.cleanUpAction}"

    node.eligibleForReuse = true
    node.cleanUpAction = 'SHUTDOWN'
}

Are you interested in contributing a fix?

No response

idontsov avatar Aug 12 '25 14:08 idontsov

Any update on this issue?

idontsov avatar Oct 15 '25 15:10 idontsov

I've not had time, but contributions are welcome.

timja avatar Oct 15 '25 15:10 timja