qpid-proton-j icon indicating copy to clipboard operation
qpid-proton-j copied to clipboard

PROTON-2823: Close transport tail when freeing connection to avoid the client application hanging when TCP FIN+ACK arrives along with TCP RST.

Open anuchandy opened this issue 9 months ago • 1 comments

We’ve noticed that after establishing the connection to the broker, if the broker network initiates the "TCP connection close FIN+ACK, along with RST" then the Proton-J does not signal a terminal event (e.g., transport close), which leaves the client application unable detect such a connection termination and recover.

The traffic flow is –

  1. TCP layer on the service side sends FIN+ACK
  2. Client TCP layer respond with FIN+ACK
  3. TCP layer on the service side sends TCP RST
  4. The Proton-J does not propagate any terminal event (e.g., transport close) to the registered handler for application to react to this disconnect.

Below is a Wireshark captured view of this traffic -

image

Below are the logs from the Proton-J in response to this traffic, Proton-J no longer emits any other logs post this –

FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = input [SSLEngineResult status = CLOSED handshakeStatus = NEED_WRAP bytesConsumed = 31 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 31]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]

While analyzing this, when this happens, we have noticed the following internally to Proton-J -

  1. Proton-J SimpleSslTransportWrapper layer is in fact detecting this close here and it marks its _head_closed flag as true indicating the outbound is closed.
  2. Proton-J TrasportImpl even generate TRANSPORT_HEAD_CLOSED event but it never generates TRANSPORT_TAIL_CLOSED event, or TRANSPORT_CLOSED event which requires tail_close also to happen.
  3. The Proton-J also frees all the internally registered Selectables, for example the Selectable associated with the TCP Connection and gracefully shut down these resources (and associated channels, timers tracking idle timeout).
  4. We can also see that the internal connectionFree handler in the IOHandler is invoked (note: none of the other terminal handlers (connectionError, connectionExpired) are invoked)

This is a PR to address this - where it closes the transport tail when connection is free-ed, resulting in Proton-J signaling the transport termination to the application handlers. With this fix, the application is able to detect the connection drop and recover.

I’m new to the Proton-J code base, so can please experts (\\cc @gemmellr) from the Proton-J community take a look at this / fix?

anuchandy avatar May 14 '24 17:05 anuchandy

Linking to JIRA https://issues.apache.org/jira/browse/PROTON-2823

anuchandy avatar May 14 '24 18:05 anuchandy