yet-another-docker-plugin icon indicating copy to clipboard operation
yet-another-docker-plugin copied to clipboard

Orphaned build agents accumulating

Open Starefossen opened this issue 8 years ago • 11 comments

Hi, and thanks for this great project.

I'm wondering if orphaned build agents is a know issue? I have a few agents running in Docker that are no longer in Jenkins with the following log message repeating in an endless loop: 1)

Failing to obtain https://**REDACTED**//computer/docker-agent-8ff84c76f8fc//slave-agent.jnlp?encrypt=true
java.io.IOException: Failed to load https://**REDACTED**//computer/docker-agent-8ff84c76f8fc//slave-agent.jnlp?encrypt=true: 404 Not Found
        at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:485)
        at hudson.remoting.Launcher.run(Launcher.java:316)
        at hudson.remoting.Launcher.main(Launcher.java:277)
Waiting 10 seconds before retry

and some other with the following message:

+ [ ! -f /tmp/config.sh ]
+ echo No config, sleeping for 1 second
No config, sleeping for 1 second
+ sleep 1
+ [ ! -f /tmp/config.sh ]
+ echo No config, sleeping for 1 second
No config, sleeping for 1 second
+ sleep 1
+ [ ! -f /tmp/config.sh ]
+ echo No config, sleeping for 1 second
No config, sleeping for 1 second

Starefossen avatar Oct 09 '17 11:10 Starefossen

This PR would "solve" it, leaving the container to die peacefully if the remoting component will receive a 4XX error code

witokondoria avatar Oct 09 '17 11:10 witokondoria

@witokondoria not exactly. In your case slave already got config with connections, but cycles. Here first error slave disappeared while container tries reconnect. Many reasons why it may happen. 1 I.e. docker daemon disappeared and jenkins found that it time to kill slave, while container has reconnect. 2) container is on the way to run, jenkins will place config soon. Or something was aborted...

@Starefossen what launcher do you use? JNLP?

KostyaSha avatar Oct 09 '17 21:10 KostyaSha

Hmm... i run latest jenkins from master and it fails with 404. Probably something was changed in JNLP or in remoting.

KostyaSha avatar Oct 09 '17 21:10 KostyaSha

@Starefossen do you have selinux?

KostyaSha avatar Oct 09 '17 22:10 KostyaSha

Yes, JNLP on latest versjon of Jenkins. No selinux.

Starefossen avatar Oct 10 '17 05:10 Starefossen

@KostyaSha, my comment was related to the endless loop issue (not the waiting for config one)

Under my experiences, the agent being unregistered from the master happened randomly after a master restart. My PR to the remoting component is meant so address such circumstances (a container trying to connect to a master that wont ever accept that connection). In no way it would solve the underlying issue (phantom unregistering), but will let the docker daemon/swarm without dumb-reconnecting containters

witokondoria avatar Oct 10 '17 08:10 witokondoria

https://gist.github.com/KostyaSha/71be4eb3c6359d6c52da6aec56506dc4

KostyaSha avatar Oct 10 '17 12:10 KostyaSha

I don't understand why docker container is getting 404 while i can open this page...

KostyaSha avatar Oct 10 '17 17:10 KostyaSha

@Starefossen i not sure but seems they stay only when JNLP launcher failed to start and retention strategy doesn't remove it. Could you post what you see node's launch log (go to node -> launch log)?

KostyaSha avatar Oct 11 '17 00:10 KostyaSha

And one more thing is queue lock that will be removed in #201

KostyaSha avatar Oct 11 '17 00:10 KostyaSha

@witokondoria i had jenkins with disabled security but slaves were getting 404! I enabled security and they connected

KostyaSha avatar Oct 11 '17 01:10 KostyaSha