artio icon indicating copy to clipboard operation
artio copied to clipboard

Race between ResendRequest and SequenceReset when using async SessionProxy

Open pcdv opened this issue 11 months ago • 3 comments

Consider the following scenario:

  • There are two FIX counterparties: ARTIO and EXCHANGE
  • After a disconnection, ARTIO has lastReceivedSeqNo = 6 and EXCHANGE has lastReceivedSeqNo = 4
  • For some reason, the last message sent by both parties was not received on the other end, so ARTIO has lastSentSeqNo = 5 and EXCHANGE has lastSentSeqNo = 7 (for example, because the connection closed before the Logout was received)
  • During next connection, both parties detect a gap of one message and send a ResendRequest: [7-0] from ARTIO, and [5-0] from EXCHANGE

After the second connection ARTIO sent messages in the wrong order:

  • Logon with MsgSeqNum = 6
  • SequenceReset with MsgSeqNum = 5 NewSeqNo = 8
  • ResendRequest with MsgSeqNum = 7

The sequence would be correct (IMHO) if the ResendRequest was sent before the SequenceReset, or if there was NewSeqNo = 7 in the SequenceReset. This sequence causes a disconnection because EXCHANGE detects a "MsgSeqNum too low" issue.

The problem seems related to the fact that the SequenceReset is sent directly by the Replayer when it is done replaying, while the ResendRequest is directly sent by the library when it detects the high MsgSeqNum in Logon message.

I should add that I use a SessionProxy to route all outbound messages through a cluster, and this is probably increasing the probability of such a bug as it delays the sending of the ResendRequest. NB: it seems that SessionProxy#sendSequenceReset() is never called, so I don't have the opportunity to override the value of NewSeqNo to a correct value. But SessionProxy#sendResendRequest() is called as expected.

I'm able to reproduce the bug easily using my custom setup/code. I will try to reproduce it using pure artio dependencies.

pcdv avatar Mar 14 '24 16:03 pcdv