docker-inbound-agent icon indicating copy to clipboard operation
docker-inbound-agent copied to clipboard

Agent instance fails to connect to master despite port being open

Open pkaramol opened this issue 5 years ago • 15 comments

Installing jenkin on GKE using the official helm chart.

Have used jnlp images with tags both 3.27-1 and 3.40-1

When starting a simple (shell execution) job, the agent pod, although it starts running, it gets terninated with error. Its error logs are the following:

jenkins-agent-5j324 jnlp java.io.IOException: Failed to connect to http://jenkins-inception.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)
jenkins-agent-5j324 jnlp 	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:196)
jenkins-agent-5j324 jnlp 	at hudson.remoting.Engine.innerRun(Engine.java:523)
jenkins-agent-5j324 jnlp 	at hudson.remoting.Engine.run(Engine.java:474)
jenkins-agent-5j324 jnlp Caused by: java.net.ConnectException: Connection refused (Connection refused)
jenkins-agent-5j324 jnlp 	at java.net.PlainSocketImpl.socketConnect(Native Method)
jenkins-agent-5j324 jnlp 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
jenkins-agent-5j324 jnlp 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
jenkins-agent-5j324 jnlp 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
jenkins-agent-5j324 jnlp 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
jenkins-agent-5j324 jnlp 	at java.net.Socket.connect(Socket.java:589)
jenkins-agent-5j324 jnlp 	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
jenkins-agent-5j324 jnlp 	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
jenkins-agent-5j324 jnlp 	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
jenkins-agent-5j324 jnlp 	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:193)
jenkins-agent-5j324 jnlp 	... 2 more
jenkins-agent-5j324 jnlp

I have created a test pod within the same master/agent namespace and no connectivity issue seems to exist:

/ # dig +short jenkins-inception.jenkins.svc.cluster.local
10.14.203.189
/ # nc -zv -w 3 jenkins-inception.jenkins.svc.cluster.local 8080
jenkins-inception.jenkins.svc.cluster.local (10.14.203.189:8080) open
/ # curl http://jenkins-inception.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/


  Jenkins

Environment:

  • cloud provider: GCP
  • master tag: lts
  • agent tag: 3.27-1 and 3.40-1
  • helm version:
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
  • kubernetes version:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-gke.14", GitCommit:"56d89863d1033f9668ddd6e1c1aea81cd846ef88", GitTreeState:"clean", BuildDate:"2019-11-07T19:12:22Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}
  • istio version: 1.4.0

pkaramol avatar Jan 28 '20 23:01 pkaramol

I believe this is happening because the envoy proxy is taking some time to set things up and the jnlp container tries to make a connection while this is still happening. I have had similar issues with recent versions of istio. Unfortunately I don't have a fix yet.

One solution would be for jnlp-slave to retry this connection instead of giving up on the first failure.

timmyers avatar Feb 28 '20 21:02 timmyers

I can also confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using an older version of istio, e.g. 1.1.15

pkaramol avatar Feb 29 '20 11:02 pkaramol

Following up on @timmyers comment this is exactly what I was observing and built a custom jnlp image that leverages wait-for-it to make sure the pod is able to connect to Jenkins prior to launching jenkins-agent. This solved the connectivity issue and from my testing its about a 3s delay on our cluster for the connection to be available.

aspring avatar Apr 12 '20 16:04 aspring

guys ,
i am facing this issue, when i am running jenkins service ( windows service 127.0.0.1:8080 ) outside minikube cluster

abhishekkarigar avatar Aug 02 '20 11:08 abhishekkarigar

@aspring could you please share the details like how you made custom image and how you have added wait.

yogesh9391 avatar Sep 10 '20 13:09 yogesh9391

guys , i am facing this issue, when i am running jenkins service ( windows service 127.0.0.1:8080 ) outside minikube cluster

If your slave is outside Cluster, then you have use NodePort for Master to expose service. After that you can connect the slave from outside cluster to Master which is inside cluster.

deepan10 avatar Sep 11 '20 14:09 deepan10

I'm facing this issue! Is there any idea except modifying jnlp images ? istio: 1.6.8 jnlp: 4.3-4

I tried to modify the configMap for jenkins-agent: add "sleep 10; jenkins-agent" to command, but not work. < command >sh -c " sleep 10; jenkins-agent " < /command >

logs:

SEVERE: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused) java.io.IOException: Failed to connect to https://xxxx-jenkins.xxxx.svc:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)

anthonyGuo avatar Nov 09 '20 05:11 anthonyGuo

facing same issue on istio 1.2.0, if u run jenkins and jenkins slave on pure kubernetes, everything works fine .

root@ubuntu:~# kubectl logs po/jenkins-slave-jrz8f -n jenkins

Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: jenkins-slave-jrz8f Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jun 01, 2021 1:00:07 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.20 Jun 01, 2021 1:00:07 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory Both error and output logs will be printed to /home/jenkins/agent/remoting Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local/] Jun 01, 2021 1:00:07 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local/tcpSlaveAgentListener/: Connection refused (Connection refused) java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local/tcpSlaveAgentListener/: Connection refused (Connection refused) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:192) at hudson.remoting.Engine.innerRun(Engine.java:518) at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:189) ... 2 more

839928622 avatar Jun 01 '21 13:06 839928622

I am getting the same issue

Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: agent-pkznn Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener INFO: Jenkins agent is running in headless mode. Oct 27, 2021 4:23:18 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 4.11 Oct 27, 2021 4:23:18 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory Oct 27, 2021 4:23:18 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://fin-orchestration-jenkins-service.fssre.svc.cluster.local:8080/] Oct 27, 2021 4:23:18 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Failed to connect to http://fin-orchestration-jenkins-service.fssre.svc.cluster.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused) java.io.IOException: Failed to connect to http://fin-orchestration-jenkins-service.fssre.svc.cluster.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214) at hudson.remoting.Engine.innerRun(Engine.java:724) at hudson.remoting.Engine.run(Engine.java:540) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source) at java.base/java.net.Socket.connect(Unknown Source) at java.base/sun.net.NetworkClient.doConnect(Unknown Source) at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source) at java.base/sun.net.www.http.HttpClient.openServer(Unknown Source) at java.base/sun.net.www.http.HttpClient.(Unknown Source) at java.base/sun.net.www.http.HttpClient.New(Unknown Source) at java.base/sun.net.www.http.HttpClient.New(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source) at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211) ... 2 more

mb250315 avatar Oct 27 '21 16:10 mb250315

I can also confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using an older version of istio, e.g. 1.1.15

I am getting it on istio 1.7.3 and GKE version 1.20.10-gke.301

mb250315 avatar Oct 27 '21 16:10 mb250315

the recommendation appears to be to add a bit of a sleep / wait-for-it / a retry.

Happy for a fix in either this repo, or in say https://github.com/jenkinsci/remoting

cc @jeffret-b

timja avatar Oct 27 '21 16:10 timja

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):

    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"

And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

jmcastellote avatar Nov 18 '21 14:11 jmcastellote

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):

    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"

And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

How to do this if I'm not using kubernetes? How to add sleep?

hg13190 avatar Dec 12 '21 08:12 hg13190

modify one of the startup scripts is easiest:

https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent or https://github.com/jenkinsci/docker-inbound-agent/blob/master/jenkins-agent.ps1

timja avatar Dec 12 '21 08:12 timja

Updating pod template might help as well

spec:
  containers:
  - name: jnlp
    image: jenkins/inbound-agent:4.3-4-jdk11
    command: ["/bin/sh","-c"]
    args: ["sleep 30; /usr/local/bin/jenkins-agent"]

sasha-bachurin avatar Mar 24 '22 22:03 sasha-bachurin

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):

    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"

And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like

psimms-r7 avatar Oct 24 '22 11:10 psimms-r7

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):

    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"

And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like

Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:

The kubelet uses readiness probes to know when a container is ready to start accepting traffic.

=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes).

=> You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes.

dduportal avatar Oct 24 '22 12:10 dduportal

Thanks @timja for the workaround. Indeed it worked for us by modifying the agent's entrypoint in k8s pod template. Adding the following to windows jnlp agent (jenkins/inbound-agent):

    command:
    - "powershell.exe"
    args:
    - "Start-Sleep"
    - "-s"
    - "5"
    - ";"
    - "powershell.exe"
    - "-f"
    - "C:/ProgramData/Jenkins/jenkins-agent.ps1"

And it all works fine again (well... 5s slower).

just fyi, this started happening on a new EKS 1.21 cluster with mixed arm and amd instances, plus windows nodes. It only happens on the windows nodes, which have no kube-proxy and depend on vpc webhooks, so perhaps that would explain the istio-like network experience of the pod.

We are seeing similar issues - only for Windows nodes as well Could we add a readiness probe to the pod template I wonder, and if so what would that look like

Hi @psimms-r7 , as per https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ , readiness probes are not usable there:

The kubelet uses readiness probes to know when a container is ready to start accepting traffic.

=> the inbound-agent are connecting to Jenkins controller, not the other way around. Unless you meant a readiness probe for Jenkins controller itself in Kubernetes? (if yes then look at the helm chart values: https://github.com/jenkinsci/helm-charts/blob/48f2acfaeec059de23d5b1065757ba8bb4621e0a/charts/jenkins/VALUES_SUMMARY.md#kubernetes-health-probes).

=> You could use startup probe though (with a Kubernetes version supporing it): https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes.

Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener?

psimms-r7 avatar Oct 24 '22 13:10 psimms-r7

The error we are seeing is slightly different actually - UnknownHostException

Error

INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/]
Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
	at hudson.remoting.Engine.innerRun(Engine.java:693)
	at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	... 2 more

I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master

    $attempt = 6
    $success = $false
    while ($attempt -gt 0 -and -not $success) {
        try {
            $Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener"
            if ($?) {
                Write-Host "AgentListener active"
                Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments
            }
            else {
                Write-Host "AgentListener failed"
            }
        }
        catch {
            $attempt--
            Start-Sleep -s 10
            Write-Host "Failed"
            Write-Host $_
        }
    }

psimms-r7 avatar Oct 25 '22 18:10 psimms-r7

Apologies, you're right, something like a startup probe - could we just do a curl on the agent listener?

I never played around with startup probes but it looks the right way to achieve. Your idea looks really good: startup probe to curl the Jenkins controller listener. Alternatively, an initContainer added to the pod.

dduportal avatar Oct 25 '22 18:10 dduportal

The error we are seeing is slightly different actually - UnknownHostException

Error

INFO: Locating server among [http://jenkins.jenkins.svc.cluster.local:8080/]
Oct 25, 2022 11:25:51 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
java.io.IOException: Failed to connect to http://jenkins.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins.jenkins.svc.cluster.local
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
	at hudson.remoting.Engine.innerRun(Engine.java:693)
	at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins.jenkins.svc.cluster.local
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
	at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
	at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	... 2 more

I am experimenting with our custom inbound agent image and tweaking the jenkins-agent.ps1 script with the below wrapped around the start-process - this appears to improved things - Note I rarely use powershell so I am sure this can be much improved, but would something like this make sense to be merged up to master

    $attempt = 6
    $success = $false
    while ($attempt -gt 0 -and -not $success) {
        try {
            $Response = Invoke-WebRequest -UseBasicParsing -Uri "$env:JENKINS_URL/tcpSlaveAgentListener"
            if ($?) {
                Write-Host "AgentListener active"
                Start-Process -FilePath $JAVA_BIN -Wait -NoNewWindow -ArgumentList $AgentArguments
            }
            else {
                Write-Host "AgentListener failed"
            }
        }
        catch {
            $attempt--
            Start-Sleep -s 10
            Write-Host "Failed"
            Write-Host $_
        }
    }

The error comes from DNS resolution in your case. The UnknownHostException is pretty clear: it is NOT related to the image itself or your powershell code.

  • Could be worth it to check the DNS resolution with an interactive shell in your Jenkins Agent Windows pod: can it resolve external domain such as google.com?
  • Can you confirm that your Jenkins controller is running in a pod named jenkins in the namespace jenkins?
  • If you have Linux pod, can you try a Linux Jenkins agent with the same URL to see if it works with the same JENKINS_URL?

=> It reminds me of https://github.com/microsoft/Windows-Containers/issues/61 (if it helps)

dduportal avatar Oct 25 '22 19:10 dduportal