docker-inbound-agent
docker-inbound-agent copied to clipboard
Agent instance fails to connect to master despite port being open
Installing jenkin on GKE using the official helm chart.
Have used jnlp images with tags both 3.27-1 and 3.40-1
When starting a simple (shell execution) job, the agent pod, although it starts running, it gets terninated with error. Its error logs are the following:
jenkins-agent-5j324 jnlp java.io.IOException: Failed to connect to http://jenkins-inception.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
jenkins-agent-5j324 jnlp at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:196)
jenkins-agent-5j324 jnlp at hudson.remoting.Engine.innerRun(Engine.java:523)
jenkins-agent-5j324 jnlp at hudson.remoting.Engine.run(Engine.java:474)
jenkins-agent-5j324 jnlp Caused by: java.net.ConnectException: Connection refused (Connection refused)
jenkins-agent-5j324 jnlp at java.net.PlainSocketImpl.socketConnect(Native Method)
jenkins-agent-5j324 jnlp at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
jenkins-agent-5j324 jnlp at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
jenkins-agent-5j324 jnlp at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
jenkins-agent-5j324 jnlp at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
jenkins-agent-5j324 jnlp at java.net.Socket.connect(Socket.java:589)
jenkins-agent-5j324 jnlp at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
jenkins-agent-5j324 jnlp at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
jenkins-agent-5j324 jnlp at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
jenkins-agent-5j324 jnlp at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
jenkins-agent-5j324 jnlp at sun.net.www.http.HttpClient.New(HttpClient.java:339)
jenkins-agent-5j324 jnlp at sun.net.www.http.HttpClient.New(HttpClient.java:357)
jenkins-agent-5j324 jnlp at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
jenkins-agent-5j324 jnlp at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
jenkins-agent-5j324 jnlp at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
jenkins-agent-5j324 jnlp at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
jenkins-agent-5j324 jnlp at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:193)
jenkins-agent-5j324 jnlp ... 2 more
jenkins-agent-5j324 jnlp
I have created a test pod within the same master/agent namespace and no connectivity issue seems to exist:
/ # dig +short jenkins-inception.jenkins.svc.cluster.local
10.14.203.189
/ # nc -zv -w 3 jenkins-inception.jenkins.svc.cluster.local 8080
jenkins-inception.jenkins.svc.cluster.local (10.14.203.189:8080) open
/ # curl http://jenkins-inception.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/
Jenkins
Environment:
- cloud provider: GCP
- master tag:
lts - agent tag:
3.27-1and3.40-1 - helm version:
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
- kubernetes version:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}
- istio version:
1.4.0
I believe this is happening because the envoy proxy is taking some time to set things up and the jnlp container tries to make a connection while this is still happening. I have had similar issues with recent versions of istio. Unfortunately I don't have a fix yet.
One solution would be for jnlp-slave to retry this connection instead of giving up on the first failure.
I can also confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using an older version of istio, e.g. 1.1.15
Following up on @timmyers comment this is exactly what I was observing and built a custom jnlp image that leverages wait-for-it to make sure the pod is able to connect to Jenkins prior to launching jenkins-agent. This solved the connectivity issue and from my testing its about a 3s delay on our cluster for the connection to be available.
guys ,
i am facing this issue, when i am running jenkins service ( windows service 127.0.0.1:8080 ) outside minikube cluster
@aspring could you please share the details like how you made custom image and how you have added wait.
guys , i am facing this issue, when i am running jenkins service ( windows service 127.0.0.1:8080 ) outside minikube cluster
If your slave is outside Cluster, then you have use NodePort for Master to expose service. After that you can connect the slave from outside cluster to Master which is inside cluster.
I'm facing this issue! Is there any idea except modifying jnlp images ? istio: 1.6.8 jnlp: 4.3-4
I tried to modify the configMap for jenkins-agent: add "sleep 10; jenkins-agent" to command, but not work. < command >sh -c " sleep 10; jenkins-agent " < /command >
logs:
SEVERE: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused) java.io.IOException: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
facing same issue on istio 1.2.0, if u run jenkins and jenkins slave on pure kubernetes, everything works fine .
root@ubuntu:~# kubectl logs po/jenkins-slave-jrz8f -n jenkins
Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: jenkins-slave-jrz8f Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jun 01, 2021 1:00:07 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.20 Jun 01, 2021 1:00:07 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory Both error and output logs will be printed to /home/jenkins/agent/remoting Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local/] Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local/tcpSlaveAgentListener/: Connection refused (Connection refused) java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local/tcpSlaveAgentListener/: Connection refused (Connection refused) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:192) at hudson.remoting.Engine.innerRun(Engine.java:518) at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:189) ... 2 more
I am getting the same issue
Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: agent-pkznn
Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener
I can also confirm that this occurs on a GKE cluster using istio
1.4.0but NOT on another one using an older version of istio, e.g.1.1.15
I am getting it on istio 1.7.3 and GKE version 1.20.10-gke.301
the recommendation appears to be to add a bit of a sleep / wait-for-it / a retry.
Happy for a fix in either this repo, or in say https://github.com/jenkinsci/remoting
cc @jeffret-b
Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template.
Adding the following to windows jnlp agent (jenkins/inbound-agent):
command:
- "powershell.exe"
args:
- "Start-Sleep"
- "-s"
- "5"
- ";"
- "powershell.exe"
- "-f"
- "C:/ProgramData/Jenkins/jenkins-agent.ps1"
And it all works fine again (well... 5s slower).
just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (
jenkins/inbound-agent):command: - "powershell.exe" args: - "Start-Sleep" - "-s" - "5" - ";" - "powershell.exe" - "-f" - "C:/ProgramData/Jenkins/jenkins-agent.ps1"And it all works fine again (well... 5s slower).
just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
How to do this if I'm not using kubernetes? How to add sleep?
modify one of the startup scripts is easiest:
https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent or https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent.ps1
Updating pod template might help as well
spec:
containers:
- name: jnlp
image: jenkins/inbound-agent:4.3-4-jdk11
command: ["/bin/sh","-c"]
args: ["sleep 30; /usr/local/bin/jenkins-agent"]
Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (
jenkins/inbound-agent):command: - "powershell.exe" args: - "Start-Sleep" - "-s" - "5" - ";" - "powershell.exe" - "-f" - "C:/ProgramData/Jenkins/jenkins-agent.ps1"And it all works fine again (well... 5s slower).
just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like
Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (
jenkins/inbound-agent):command: - "powershell.exe" args: - "Start-Sleep" - "-s" - "5" - ";" - "powershell.exe" - "-f" - "C:/ProgramData/Jenkins/jenkins-agent.ps1"And it all works fine again (well... 5s slower).
just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like
Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:
The kubelet uses readiness probes to know when a container is ready to start accepting traffic.
=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes).
=> You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes.
Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (
jenkins/inbound-agent):command: - "powershell.exe" args: - "Start-Sleep" - "-s" - "5" - ";" - "powershell.exe" - "-f" - "C:/ProgramData/Jenkins/jenkins-agent.ps1"And it all works fine again (well... 5s slower).
just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.
We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like
Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:
The kubelet uses readiness probes to know when a container is ready to start accepting traffic.
=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes).
=> You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes.
Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener?
The error we are seeing is slightly different actually - UnknownHostException
Error
INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/]
Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
at hudson.remoting.Engine.innerRun(Engine.java:693)
at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
... 2 more
I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master
$attempt = 6
$success = $false
while ($attempt -gt 0 -and -not $success) {
try {
$Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener"
if ($?) {
Write-Host "AgentListener active"
Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments
}
else {
Write-Host "AgentListener failed"
}
}
catch {
$attempt--
Start-Sleep -s 10
Write-Host "Failed"
Write-Host $_
}
}
Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener?
I never played around with startup probes but it looks the right way to achieve. Your idea looks really good: startup probe to curl the Jenkins controller listener. Alternatively, an initContainer added to the pod.
The error we are seeing is slightly different actually - UnknownHostException
Error
INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/] Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217) at hudson.remoting.Engine.innerRun(Engine.java:693) at hudson.remoting.Engine.run(Engine.java:518) Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) at java.base/java.net.Socket.connect(Socket.java:609) at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177) at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474) at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569) at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341) at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362) at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081) at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214) ... 2 moreI am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master
$attempt = 6 $success = $false while ($attempt -gt 0 -and -not $success) { try { $Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener" if ($?) { Write-Host "AgentListener active" Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments } else { Write-Host "AgentListener failed" } } catch { $attempt-- Start-Sleep -s 10 Write-Host "Failed" Write-Host $_ } }
The error comes from DNS resolution in your case. The UnknownHostException is pretty clear: it is NOT related to the image itself or your powershell code.
- Could be worth it to check the DNS resolution with an interactive shell in your Jenkins Agent Windows pod: can it resolve external domain such as
google.com? - Can you confirm that your Jenkins controller is running in a pod named
jenkinsin the namespacejenkins? - If you have Linux pod, can you try a Linux Jenkins agent with the same URL to see if it works with the same
JENKINS_URL?
=> It reminds me of https://github.com/microsoft/Windows-Containers/issues/61 (if it helps)