Orphaned build agents accumulating
Hi, and thanks for this great project.
I'm wondering if orphaned build agents is a know issue? I have a few agents running in Docker that are no longer in Jenkins with the following log message repeating in an endless loop: 1)
Failing to obtain https://**REDACTED**//computer/docker-agent-8ff84c76f8fc//slave-agent.jnlp?encrypt=true
java.io.IOException: Failed to load https://**REDACTED**//computer/docker-agent-8ff84c76f8fc//slave-agent.jnlp?encrypt=true: 404 Not Found
at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:485)
at hudson.remoting.Launcher.run(Launcher.java:316)
at hudson.remoting.Launcher.main(Launcher.java:277)
Waiting 10 seconds before retry
and some other with the following message:
+ [ ! -f /tmp/config.sh ]
+ echo No config, sleeping for 1 second
No config, sleeping for 1 second
+ sleep 1
+ [ ! -f /tmp/config.sh ]
+ echo No config, sleeping for 1 second
No config, sleeping for 1 second
+ sleep 1
+ [ ! -f /tmp/config.sh ]
+ echo No config, sleeping for 1 second
No config, sleeping for 1 second
This PR would "solve" it, leaving the container to die peacefully if the remoting component will receive a 4XX error code
@witokondoria not exactly. In your case slave already got config with connections, but cycles. Here first error slave disappeared while container tries reconnect. Many reasons why it may happen. 1 I.e. docker daemon disappeared and jenkins found that it time to kill slave, while container has reconnect. 2) container is on the way to run, jenkins will place config soon. Or something was aborted...
@Starefossen what launcher do you use? JNLP?
Hmm... i run latest jenkins from master and it fails with 404. Probably something was changed in JNLP or in remoting.
@Starefossen do you have selinux?
Yes, JNLP on latest versjon of Jenkins. No selinux.
@KostyaSha, my comment was related to the endless loop issue (not the waiting for config one)
Under my experiences, the agent being unregistered from the master happened randomly after a master restart. My PR to the remoting component is meant so address such circumstances (a container trying to connect to a master that wont ever accept that connection). In no way it would solve the underlying issue (phantom unregistering), but will let the docker daemon/swarm without dumb-reconnecting containters
https://gist.github.com/KostyaSha/71be4eb3c6359d6c52da6aec56506dc4
I don't understand why docker container is getting 404 while i can open this page...
@Starefossen i not sure but seems they stay only when JNLP launcher failed to start and retention strategy doesn't remove it. Could you post what you see node's launch log (go to node -> launch log)?
And one more thing is queue lock that will be removed in #201
@witokondoria i had jenkins with disabled security but slaves were getting 404! I enabled security and they connected