qpid-proton-j
qpid-proton-j copied to clipboard
PROTON-2823: Close transport tail when freeing connection to avoid the client application hanging when TCP FIN+ACK arrives along with TCP RST.
We’ve noticed that after establishing the connection to the broker, if the broker network initiates the "TCP connection close FIN+ACK, along with RST" then the Proton-J does not signal a terminal event (e.g., transport close), which leaves the client application unable detect such a connection termination and recover.
The traffic flow is –
- TCP layer on the service side sends FIN+ACK
- Client TCP layer respond with FIN+ACK
- TCP layer on the service side sends TCP RST
- The Proton-J does not propagate any terminal event (e.g., transport close) to the registered handler for application to react to this disconnect.
Below is a Wireshark captured view of this traffic -
Below are the logs from the Proton-J in response to this traffic, Proton-J no longer emits any other logs post this –
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = input [SSLEngineResult status = CLOSED handshakeStatus = NEED_WRAP bytesConsumed = 31 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 31]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
FINEST o.a.q.p.e.i.s.SimpleSslTransportWrapper [reactor-executor-1] useClientMode = true direction = output [SSLEngineResult status = CLOSED handshakeStatus = NOT_HANDSHAKING bytesConsumed = 0 bytesProduced = 0]
While analyzing this, when this happens, we have noticed the following internally to Proton-J -
- Proton-J SimpleSslTransportWrapper layer is in fact detecting this close here and it marks its _head_closed flag as true indicating the outbound is closed.
- Proton-J TrasportImpl even generate TRANSPORT_HEAD_CLOSED event but it never generates TRANSPORT_TAIL_CLOSED event, or TRANSPORT_CLOSED event which requires tail_close also to happen.
- The Proton-J also frees all the internally registered Selectables, for example the Selectable associated with the TCP Connection and gracefully shut down these resources (and associated channels, timers tracking idle timeout).
- We can also see that the internal connectionFree handler in the IOHandler is invoked (note: none of the other terminal handlers (connectionError, connectionExpired) are invoked)
This is a PR to address this - where it closes the transport tail when connection is free-ed, resulting in Proton-J signaling the transport termination to the application handlers. With this fix, the application is able to detect the connection drop and recover.
I’m new to the Proton-J code base, so can please experts (\\cc @gemmellr) from the Proton-J community take a look at this / fix?
Linking to JIRA https://issues.apache.org/jira/browse/PROTON-2823