remoting icon indicating copy to clipboard operation
remoting copied to clipboard

fix: add exponential backoff

Open PertsevRoman opened this issue 3 years ago • 1 comments
trafficstars

I faced with DNS/timeout issues during agent initialization WebSocket mode

io.jenkins.remoting.shaded.javax.websocket.DeploymentException: Connection failed.
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer$1.call(JdkClientContainer.java:187)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer$1.call(JdkClientContainer.java:107)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer.openClientSocket(JdkClientContainer.java:192)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:647)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
	at hudson.remoting.Engine.runWebSocket(Engine.java:656)
	at hudson.remoting.Engine.run(Engine.java:495)
Caused by: java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:104)
	at sun.nio.ch.UnixAsynchronousSocketChannelImpl.implConnect(UnixAsynchronousSocketChannelImpl.java:302)
	at sun.nio.ch.AsynchronousSocketChannelImpl.connect(AsynchronousSocketChannelImpl.java:210)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter.handleConnect(TransportFilter.java:184)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.connect(Filter.java:80)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.connect(Filter.java:83)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.connect(Filter.java:83)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.connect(ClientFilter.java:99)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer.connectSynchronously(JdkClientContainer.java:326)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer.access$700(JdkClientContainer.java:58)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer$1.call(JdkClientContainer.java:156)
	... 12 more

TCP socket mode

java.io.IOException: Failed to connect to http://jenkins-url/tcpSlaveAgentListener/: connect timed out
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	at hudson.remoting.Engine.innerRun(Engine.java:733)
	at hudson.remoting.Engine.run(Engine.java:539)
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1228)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211)
	... 2 more

I'm trying to detect exact issue. Seems it relates to some K8s/JDK networking corner case. Anyway the PR introduces exponential backoff workaround which resolves the issue on source code level. It can help to avoid potential network issues.

PertsevRoman avatar May 24 '22 07:05 PertsevRoman

Merge conflict, possibly with #603 etc. The idea is fine but this is probably a candidate for closing unless the author still has time and interest in cleaning it up.

jglick avatar Nov 29 '22 18:11 jglick