malamute
malamute copied to clipboard
Reconnecting issue
I'm facing time to time some issue when the server looping (and leaking) on a "invalid command" message. I instrumented this message in order to know which command and which state. The command is stream_read_event in start_state. After few investigations I may understand why i'm falling in this "invalid command" and why looping on it. For example I have other invalid command:
I: 19-04-02 02:23:27 client 93 address='(null)' - invalid command 13 in state 1
I: 19-04-02 02:23:27 client 93 address='AA-5eec8db4-6095-441c-9f33-3143f3a3e605' - registering
I: 19-04-02 02:23:27 client 94 address='(null)' - invalid command 13 in state 1
I: 19-04-02 02:23:27 client 94 address='BB-f83308d6-4048-4302-b63b-1ef9f658c822' - registering
But in this case, the client immediatly register again. In my loop I have the following
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
I: 19-04-02 02:52:47 client 77 address='(null)' - invalid command 4 in state 1
many Hundred thousand lines.
I'm thing that the issue is the state reconnecting of the client. This state handling reconnecting and replay, for example the "set consumer" event is replayed after the reconnection, which seems good.
- But what happen if the program calls during this reconnection, the function mlm_client_set_consumer ? In this case the client will replay the first message (zlistx_next : First time, acts as zlistx_first()) without waiting the reconnection
- During replay a set_consumer command will produce an OK from the server, OK will restart the replay list (zlistx_first)
My loop is difficult to reproduce but I captured an other case with the client log whre the second case happens:
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : OK
D: 19-04-03 11:49:43 220849:mlm_client : $ client is connected
D: 19-04-03 11:49:43 220849:mlm_client : $ get first replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : set_consumer
D: 19-04-03 11:49:43 220849:mlm_client : $ send STREAM_READ
D: 19-04-03 11:49:43 220849:mlm_client : $ get next replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : set_consumer
D: 19-04-03 11:49:43 220849:mlm_client : $ send STREAM_READ
D: 19-04-03 11:49:43 220849:mlm_client : $ get next replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : OK
D: 19-04-03 11:49:43 220849:mlm_client : $ client is connected
D: 19-04-03 11:49:43 220849:mlm_client : $ get first replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : set_consumer
D: 19-04-03 11:49:43 220849:mlm_client : $ send STREAM_READ
D: 19-04-03 11:49:43 220849:mlm_client : $ get next replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : set_consumer
D: 19-04-03 11:49:43 220849:mlm_client : $ send STREAM_READ
D: 19-04-03 11:49:43 220849:mlm_client : $ get next replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : OK
D: 19-04-03 11:49:43 220849:mlm_client : $ client is connected
D: 19-04-03 11:49:43 220849:mlm_client : $ get first replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : set_consumer
D: 19-04-03 11:49:43 220849:mlm_client : $ send STREAM_READ
D: 19-04-03 11:49:43 220849:mlm_client : $ get next replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : set_consumer
D: 19-04-03 11:49:43 220849:mlm_client : $ send STREAM_READ
D: 19-04-03 11:49:43 220849:mlm_client : $ get next replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
D: 19-04-03 11:49:43 220849:mlm_client : reconnecting:
D: 19-04-03 11:49:43 220849:mlm_client : OK
D: 19-04-03 11:49:43 220849:mlm_client : $ client is connected
D: 19-04-03 11:49:43 220849:mlm_client : $ get first replay command
D: 19-04-03 11:49:43 220849:mlm_client : > reconnecting
...
Shouldn't we split the reconnecting state in reconnecting and replay states?