amazon-ecs-plugin icon indicating copy to clipboard operation
amazon-ecs-plugin copied to clipboard

ECS Agents Re-used unexpectedly

Open melbit-michaelw opened this issue 1 year ago • 4 comments

Jenkins and plugins versions report

Environment
Jenkins: 2.426.3
OS: Linux - 4.14.336-178.554.amzn1.x86_64
Java: 17.0.9 - Eclipse Adoptium (OpenJDK 64-Bit Server VM)
---
Office-365-Connector:4.21.0
amazon-ecr:1.114.vfd22430621f5
amazon-ecs:1.49
ansicolor:1.0.4
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
audit-trail:361.v82cde86c784e
authentication-tokens:1.53.v1c90fd9191a_b_
aws-credentials:218.v1b_e9466ec5da_
aws-java-sdk-ec2:1.12.633-430.vf9a_e567a_244f
aws-java-sdk-ecr:1.12.633-430.vf9a_e567a_244f
aws-java-sdk-ecs:1.12.633-430.vf9a_e567a_244f
aws-java-sdk-efs:1.12.633-430.vf9a_e567a_244f
aws-java-sdk-minimal:1.12.633-430.vf9a_e567a_244f
blueocean:1.27.10
blueocean-bitbucket-pipeline:1.27.10
blueocean-commons:1.27.10
blueocean-config:1.27.10
blueocean-core-js:1.27.10
blueocean-dashboard:1.27.10
blueocean-display-url:2.4.2
blueocean-events:1.27.10
blueocean-executor-info:1.27.10
blueocean-git-pipeline:1.27.10
blueocean-github-pipeline:1.27.10
blueocean-i18n:1.27.10
blueocean-jira:1.27.10
blueocean-jwt:1.27.10
blueocean-personalization:1.27.10
blueocean-pipeline-api-impl:1.27.10
blueocean-pipeline-editor:1.27.10
blueocean-pipeline-scm-api:1.27.10
blueocean-rest:1.27.10
blueocean-rest-impl:1.27.10
blueocean-web:1.27.10
bootstrap5-api:5.3.2-3
bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9
branch-api:2.1144.v1425d1c3d5a_7
build-timeout:1.32
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.0.2
cloudbees-bitbucket-branch-source:866.vdea_7dcd3008e
cloudbees-folder:6.858.v898218f3609d
command-launcher:100.v2f6722292ee8
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.11.0-95.v22a_d30ee5d36
concurrent-step:1.0.0
conditional-buildstep:1.4.3
config-file-provider:968.ve1ca_eb_913f8c
configuration-as-code:1775.v810dc950b_514
copyartifact:722.v0662a_9b_e22a_c
credentials:1319.v7eb_51b_3a_c97b_
credentials-binding:657.v2b_19db_7d6e6d
display-url-api:2.200.vb_9327d658781
docker-build-publish:1.4.0
docker-commons:439.va_3cb_0a_6a_fb_29
docker-workflow:572.v950f58993843
durable-task:547.vd1ea_007d100c
echarts-api:5.4.3-2
email-ext:2.104
envinject:2.908.v66a_774b_31d93
envinject-api:1.199.v3ce31253ed13
favorite:2.208.v91d65b_7792a_c
font-awesome-api:6.5.1-2
git:5.2.1
git-client:4.6.0
github:1.38.0
github-api:1.318-461.v7a_c09c9fa_d63
github-branch-source:1772.va_69eda_d018d4
gson-api:2.10.1-15.v0d99f670e0a_7
handy-uri-templates-2-api:2.1.8-30.v7e777411b_148
hashicorp-vault-plugin:364.vf5d54b_3dc313
htmlpublisher:1.32
http_request:1.18
icon-shim:3.0.0
instance-identity:185.v303dc7c645f9
ionicons-api:56.v1b_1c8c49374e
jackson2-api:2.16.1-373.ve709c6871598
jakarta-activation-api:2.0.1-3
jakarta-mail-api:2.0.1-3
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.9-1
jdk-tool:66.vd8fa_64ee91b_d
jenkins-design-language:1.27.10
jersey2-api:2.41-133.va_03323b_a_1396
jira:3.12
jjwt-api:0.11.5-77.v646c772fddb_0
job-import-plugin:3.6
jobConfigHistory:1229.v3039470161a_d
joda-time-api:2.12.6-21.vca_fd74418fb_7
jquery3-api:3.7.1-1
json-api:20231013-17.v1c97069404b_e
json-path-api:2.9.0-33.v2527142f2e1d
junit:1259.v65ffcef24a_88
mailer:463.vedf8358e006b_
matrix-auth:3.2.1
matrix-project:822.824.v14451b_c0fd42
metrics:4.2.21-449.v6960d7c54c69
mina-sshd-api-common:2.12.0-90.v9f7fb_9fa_3d3b_
mina-sshd-api-core:2.12.0-90.v9f7fb_9fa_3d3b_
monitoring:1.95.0
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:42.v0739460cda_c4
pipeline-graph-analysis:202.va_d268e64deb_3
pipeline-groovy-lib:704.vc58b_8890a_384
pipeline-input-step:477.v339683a_8d55e
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2175.v76a_fff0a_2618
pipeline-model-definition:2.2175.v76a_fff0a_2618
pipeline-model-extensions:2.2175.v76a_fff0a_2618
pipeline-rest-api:2.34
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2175.v76a_fff0a_2618
pipeline-stage-view:2.34
pipeline-utility-steps:2.16.1
plain-credentials:143.v1b_df8b_d3b_e48
plugin-util-api:3.8.0
prism-api:1.29.0-10
pubsub-light:1.18
rebuild:330.v645b_7df10e2a_
resource-disposer:0.23
run-condition:1.7
saml:4.464.vea_cb_75d7f5e0
scm-api:683.vb_16722fb_b_80b_
script-security:1313.v7a_6067dc7087
slack:684.v833089650554
snakeyaml-api:2.2-111.vc6598e30cc65
sse-gateway:1.26
ssh-agent:346.vda_a_c4f2c8e50
ssh-credentials:308.ve4497b_ccd8f4
sshd:3.303.vefc7119b_ec23
structs:337.v1b_04ea_4df7c8
timestamper:1.26
token-macro:400.v35420b_922dcb_
trilead-api:2.133.vfb_8a_7b_9c5dd1
uno-choice:2.8.1
variant:60.v7290fc0eb_b_cd
workflow-aggregator:596.v8c21c963d92d
workflow-api:1291.v51fd2a_625da_7
workflow-basic-steps:1042.ve7b_140c4a_e0c
workflow-cps:3853.vb_a_490d892963
workflow-durable-task-step:1322.v63864b_7a_e384
workflow-job:1385.vb_58b_86ea_fff1
workflow-multibranch:773.vc4fe1378f1d5
workflow-scm-step:415.v434365564324
workflow-step-api:657.v03b_e8115821b_
workflow-support:865.v43e78cc44e0d
ws-cleanup:0.45

What Operating System are you using (both controller, and any agents involved in the problem)?

AWS ECS hosting both the Jenkins instance and various agents.

Reproduction steps

Sorry, I don't have a simple reproduction at this stage. We've recently upgraded from Jenkins 2.346.3 with ecs plugin 1.48 to Jenkins 2.426.3 with ecs plugin 1.49.

We have a scripted pipeline that uses parallel to run multiple ecs nodes concurrently. For capacity reasons, this is limited to launching 3 concurrent nodes through the use of semaphores.

Since we upgraded, our ecs containers are being re-used when running these parallel jobs whilst previously the nodes would run a single job and then terminate.

I'm not sure if it's relevant, but the agent containers were also upgraded at the same time to use a newer version of the agent.jar (due to remoting requirements with newer Jenkins).

Is there some configuration option that I can set to ensure our ecs nodes run only a single job and then terminate ?

Expected Results

ECS Node runs a single job and then terminates.

Actual Results

ECS Node runs subsequent jobs after completion of the initial job.

Anything else?

No response

Are you interested in contributing a fix?

No response

melbit-michaelw avatar Feb 06 '24 01:02 melbit-michaelw

I've just done some testing on our Jenkins instance with the ECS plugin downgraded to 1.48 and don't see this behaviour. This implies that it's a change in the ECS plugin that has caused it.

What I'm not sure about, and don't have a test case for, is whether the ecs tasks are re-used only within a single pipeline, or whether they are re-used across other pipelines as well.

Either way, this breaks things for our use case, as we rely on the containers only running during their specific node block (we use scripted pipelines) (i.e. our pipelines are broken as we create a 'results' directory.. since some containers are now being reused, that directory already exists when the container is reused and results in the pipeline failing).

is there a workaround to force containers to only be used once ?

melbit-michaelw avatar Feb 06 '24 05:02 melbit-michaelw

Here's a reasonably minimal script that can be used to reproduce the issue:

node('ecs-agent-name') {
  stage("First") {
    sh(script:"""mkdir results""")
  }
}

node('ecs-agent-name') {
  stage("Second") {
    sh(script:"""mkdir results""")
  }
}

The 'First' stage will succeed, and then the 'Second' stage will fail as the results directory now exists due to the unexpected container re-use.

melbit-michaelw avatar Feb 06 '24 23:02 melbit-michaelw

@melbit-michaelw, I don't think this is a bug but rather the result of a different bug fix.

https://github.com/jenkinsci/amazon-ecs-plugin/issues/326

What is the number of executors you have set per agent?

Stericson avatar Feb 07 '24 12:02 Stericson

Hi @Stericson,

Sorry for the delayed response.

We aren't explicitly setting it anywhere (we are using config-as-code to configure Jenkins), and thus I believe it will be implicitly set to 1.

I ran the script console test code from the issue you linked and got back 1 executor.

melbit-michaelw avatar Feb 18 '24 22:02 melbit-michaelw