jetty.project icon indicating copy to clipboard operation
jetty.project copied to clipboard

TCP connection to reverse proxy remains open when websocket connection fails

Open typalm opened this issue 3 years ago • 11 comments

Jetty version 9.4.36

Java version 1.8

Context I have two Java daemon processes that need to communicate via websockets. These processes are not guaranteed to be on the same machine. I'm leveraging Nginx as a reverse proxy on the server machine. Sometimes the client will be passed through a load balancer to get to the right machine.

Question It has been found that once the client attempts a connection, a TCP connection between the client and Nginx will persist. This happens even if the server process isn't running. Is there a way to force the TCP connection to close/reset so that it can go back through the load balancer? Or is this something that's done via Nginx config?

Additional Details I've been conforming to the JSR 356 standard to the best of my ability, but that isn't required.

Example client code making the connection:

ClientContainer container = new ClientContainer();    //Only one of these is currently being instantiated and configured for the process lifetime
container.getSslContextFactory().setTrustStorePath("truststore path");
container.getSslContextFactory().setTrustStorePassword("changeit");
container.getSslContextFactory().setEndpointIdentificationAlgorithm("https");

//logic repeated on each connection attempt
try {
	container.start();
	container.connectToServer(
			new ClientEndpoint(),
			ClientEndpointConfig.Builder.create().configurator(new CustomConfigurator()).build(),
			new URI("wss://something")
	);
} catch (Exception ex) {
	logger.error("Failed to connect", ex);
}

Example Client endpoint

public class ClientEndpoint extends Endpoint {

        @Override
        public void onOpen(Session session, EndpointConfig endpointConfig) {
	        logger.info("Connection opened");
        }
        
        @Override
        public void onClose(Session session, CloseReason closeReason) {
	        logger.info("Connection closed");
        }
        
        @Override
        public void onError(Session session, Throwable thr) {
	        logger.error("Connection error", thr);
        }
}

Solution So Far This solution involves creating a new ClientContainer every attempt. And I know this goes against the recommendation, but it's the best thing I have so far.

//logic repeated on each connection attempt
ClientContainer container = new ClientContainer();    //This time this container is created every connection attempt
container.getSslContextFactory().setTrustStorePath("truststore path");
container.getSslContextFactory().setTrustStorePassword("changeit");
container.getSslContextFactory().setEndpointIdentificationAlgorithm("https");

try {
	container.start();
	container.connectToServer(
			new ClientEndpoint(),
			ClientEndpointConfig.Builder.create().configurator(new CustomConfigurator()).build(),
			new URI("wss://something")
	);
} catch (Exception ex) {
	logger.error("Failed to connect", ex);
        LifeCycle.stop(container);
}

Example Client endpoint

public static class ClientEndpoint extends Endpoint {

        @Override
        public void onOpen(Session session, EndpointConfig endpointConfig) {
	        logger.info("Connection opened");
        }
        
        @Override
        public void onClose(Session session, CloseReason closeReason) {
	        logger.info("Connection closed");
                LifeCycle.stop(session.getContainer());  //for cleaning up the containers that were successfully used
        }
        
        @Override
        public void onError(Session session, Throwable thr) {
	        logger.error("Connection error", thr);
        }
}

typalm avatar Jun 09 '22 20:06 typalm

From the point of view of the client, the connection to nginx succeeded, even if the backend server (behind nginx) is not running.

In this particular case (backend server not running) nginx could still cache connections rather than closing them. I suggest you look into nginx configuration to see if there is an option to close such connections if the backend server is down.

If you send a message and it fails because the backend server is down, you can close the connection from the client, as connectToServer(...) returns a Session and you can call Session.close().

sbordet avatar Jun 13 '22 14:06 sbordet

If you send a message and it fails because the backend server is down, you can close the connection from the client, as connectToServer(...) returns a Session and you can call Session.close().

I'm unable to do this. When the backend (behind nginx) is down connectToServer(...) ends up throwing an exception, so I never have access to the Session.

Connection Attempt w/ Log

ClientContainer container = new ClientContainer();    //Only one of these is currently being instantiated and configured for the process lifetime
container.getSslContextFactory().setTrustStorePath("truststore path");
container.getSslContextFactory().setTrustStorePassword("changeit");
container.getSslContextFactory().setEndpointIdentificationAlgorithm("https");

//logic repeated on each connection attempt
try {
	container.start();
	container.connectToServer(
			new ClientEndpoint(),
			ClientEndpointConfig.Builder.create().configurator(new CustomConfigurator()).build(),
			new URI("wss://something")
	);
        logger.info("Successfully connected");  //this log is never reached on failed connection attempts.
} catch (Exception ex) {
	logger.error("Failed to connect", ex);
}

Stack Trace

java.io.IOException: Connect failure
	at org.eclipse.jetty.websocket.jsr356.ClientContainer.connect(ClientContainer.java:263)
	at org.eclipse.jetty.websocket.jsr356.ClientContainer.connectToServer(ClientContainer.java:286)
	at scrubbed-entry-point
Caused by: org.eclipse.jetty.websocket.api.UpgradeException: Failed to upgrade to websocket: Unexpected HTTP Response Status Code: 504 Gateway Time-out
	at org.eclipse.jetty.websocket.client.WebSocketUpgradeRequest.onComplete(WebSocketUpgradeRequest.java:537)
	at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:218)
	at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:210)
	at org.eclipse.jetty.client.HttpReceiver.terminateResponse(HttpReceiver.java:481)
	at org.eclipse.jetty.client.HttpReceiver.terminateResponse(HttpReceiver.java:461)
	at org.eclipse.jetty.client.HttpReceiver.responseSuccess(HttpReceiver.java:424)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.messageComplete(HttpReceiverOverHTTP.java:365)
	at org.eclipse.jetty.http.HttpParser.handleContentMessage(HttpParser.java:586)
	at org.eclipse.jetty.http.HttpParser.parseContent(HttpParser.java:1711)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1540)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:204)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:144)
	at org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:79)
	at org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:131)
	at org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:169)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:540)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:395)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
	... 1 more

Closing Note Investigating what could be on the nginx side that would keep this connection open.

typalm avatar Jun 13 '22 15:06 typalm

So if the client could not connect to nginx, how do you know you have a connection open? Seems to me that no connection was created, so there is none open.

sbordet avatar Jun 13 '22 15:06 sbordet

By monitoring with netstat or an equivalent.

The moment the client process makes the first connection attempt, I can see a connection with status ESTABLISHED between the client process and nginx. I can leave it for 10+ minutes where the connection won't be dropped and the local/foreign address remain the same.

During this time I can start the back end server process so that a full ws connection can be established. On killing the back end server process, the persistent connection to nginx gets destroyed. Or at least until the client process attempts to connect again, leading to another persistent connection.

typalm avatar Jun 13 '22 15:06 typalm

Sorry it's not clear.

You see a connection ESTABLISHED and an exception on the client?

The last paragraph is not clear: you say that the connection to nginx gets destroyed, but you complain in other comments that it is not.

Please detail exactly your problem, the evidence you have, the tools you use, etc. so it is clear where in Jetty the problem could be?

sbordet avatar Jun 13 '22 16:06 sbordet

You see a connection ESTABLISHED and an exception on the client?

Correct. This is when the client process is attempting connections and the back end server process is NOT running.

The last paragraph is not clear: you say that the connection to nginx gets destroyed, but you complain in other comments that it is not.

And yeah. My questions have not been consistent with my findings. Thanks for pointing that out.

Fixed Question It has been found that once the client process attempts a connection, a TCP connection between the client and Nginx will persist while the client is still making connection attempts. Once a ws connection is established between the end processes, it will continue to use the TCP connection made during the first ws connection attempt. This connection does appear to get cleaned up properly once a full ws connection is established (either when the server process dies or the Session is closed). The emphasis of my problem is where the persistent TCP connection is affecting subsequent ws connection attempts that could get the system stuck in a spot where it cannot connect (due to the load balancer pointing to a location other than the persistent TCP connection).

Additional Observed Behavior When the ws connection between the end processes is established:

  1. and the back end server process is killed, the TCP connection between the client process and nginx is closed
  2. and the ws connection is closed via the Session object, the TCP connection appears to be closing. More precisely, the connection status changes to TIME_WAIT as the connection gets phased out.

typalm avatar Jun 13 '22 17:06 typalm

Sorry it's still not clear.

A websocket connection is a persistent connection until it's explicitly closed by either party.

It has been found that once the client process attempts a connection, a TCP connection between the client and Nginx will persist while the client is still making connection attempts.

This makes no sense.

If a TCP connection is established between the client and nginx, it will persist until explicitly closed.

If the client initiates other connections, the existing ones will remain untouched and will continue to exist, while new connections are being established.

Can you please state what the problem is, what do you expect, and what do you see instead?

sbordet avatar Jun 14 '22 06:06 sbordet

Problem Failure of a WS connection impacts subsequent connection attempts. After the first WS connection attempt failure, subsequent connection attempts aren't routed through the WS url

Desired Behavior

  1. Client makes the first attempt to establish a WS connection to the back end server.
  2. The first WS connection attempt that fails will result in: a. No open connections or other fragment leftover from the attempt
  3. Client continues to make WS connection attempts to the back end server. During this time... a. The WS connection is established through a new TCP connection since there is no previously existing one

Observed Behavior

  1. Client makes the first attempt to establish a WS connection to the back end server.
  2. The first WS connection attempt fails which results in: a. A Session object never being received since the connectToServer(..) call threw an exception b. A persistent TCP connection that can be seen between the client and nginx.
  3. Client continues to make WS connection attempts to the back end server. During this time... a. The TCP connection established in 2b appears to never change. (local/foreign addresses and connection status don't change) b. The connection attempts appear to reuse the TCP connection from 2b. Speculation of this is due to the subsequent connection attempts not hitting the load balancer that the WS url is pointing to.

typalm avatar Jun 20 '22 19:06 typalm

I still have doubts about what you say you're seeing, as Jetty does not do 3, and 3b does not make sense. Can you provide evidence of what is happening? Client DEBUG logs, better yet a network trace with e.g. Wireshark. You on Windows? Did you disable Antivirus, Firewalls, etc.?

sbordet avatar Jun 20 '22 21:06 sbordet

How does the server fail the first WS Connection attempt? Details please.

joakime avatar Jun 20 '22 21:06 joakime

@typalm I think there is a misunderstanding / wrong expectations that need to be cleared out.

A WebSocket upgrade can happen with both HTTP/1.1 and HTTP/2. While with HTTP/1.1 there may be some expectation that the connection is closed if the WebSocket upgrade fails, I don't think it's the case in general, as it is a perfectly valid HTTP connection that just served a successful request/response.

This is even more true for HTTP/2, where an application cannot just close connections on a multiplexed HTTP/2 connection that may be used concurrently for other HTTP requests, including other WebSocket upgrades.

I too was wrong expecting that for HTTP/1.1 a failed WebSocket upgrade would close the connection, but @lachlan-roberts pointed out that you may have failed the upgrade for /foo but may succeed the upgrade for /bar, so reusing the connection is actually a feature.

So I don't think the client is doing something wrong.

It is the server that controls the connection behavior. An application that uses HTTP/1.1 may add Connection: close to a WebSocket upgrade attempt that is responded with a 200 or 404, etc. For HTTP/2 it would be more difficult because the connection is not influenced by single requests.

If you have a load balancer / reverse proxy that directs connection1 to server1, and server1 goes down, then it's the load balancer that must close connection1, unless it knows how to route traffic to server2 (which is unlikely because it's subject to a data race -- data may be arriving at any moment).

Seems to me you have a failover problem on the load balancer / server (which is not closing connections when it should) rather than a client issue.

The last resort may be to configure the client to use a connection just for one request, but this may paint you in a corner, so I would suggest that you look harder on the load balancer / server first.

HttpClientTransport transport = new HttpClientTransportOverHTTP(connector);
transport.setConnectionPoolFactory(destination -> {
  DuplexConnectionPool pool = ...
  pool.setMaxUsageCount(1);
});

sbordet avatar Jun 21 '22 09:06 sbordet

This issue has been automatically marked as stale because it has been a full year without activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 22 '23 03:06 github-actions[bot]

No further comments, please reopen if this is still an issue.

sbordet avatar Jun 23 '23 16:06 sbordet

Hi, I'm encountering the same issue, I do proxy_pass tcp connections from my nginx server a set of our load balanced backend servers. Connection is seen to be established and correct once the Client connects through my DNS that is pointed to my proxy server but it still shows up even after the connection from my nginx to backend server is terminated

keithbenedicto-personal avatar Nov 28 '23 16:11 keithbenedicto-personal

@typalm have you got any solution with this? We can't set termination headers on the client side hence we have to tinker about the nginx configuration itself. Thanks

keithbenedicto-personal avatar Nov 28 '23 16:11 keithbenedicto-personal