RaftKeeper
RaftKeeper copied to clipboard
`test_session/test.py::test_reconnection` is flaky
The failed test test_session/test.py::test_reconnection
is caused by client.
When raftkeeper node restared. Client will try to reconnect to server
- client
192.168.32.1:42492
send update session request with session id 7 - Session timed out and the client just try to create new session request, but he internal id of raftkeeper is accidentally is 7
- Because both upate session and new session callback use the same container the ids conflict.
2024-05-16T07:00:23.8159265Z 2024.05.16 06:52:12.230635 [ 164 ] <Trace> ConnectionHandler: Peer 192.168.32.1:42492#0x0 is readable
2024-05-16T07:00:23.8159631Z 2024.05.16 06:52:12.230676 [ 164 ] <Trace> ConnectionHandler: Peer 192.168.32.1:42492#0x0 read request length : 45
2024-05-16T07:00:23.8160126Z 2024.05.16 06:52:12.230714 [ 164 ] <Trace> ConnectionHandler: Peer 192.168.32.1:42492#0x0 read request done, body length : 45, req_body_buf used 45
2024-05-16T07:00:23.8160484Z 2024.05.16 06:52:12.230754 [ 164 ] <Debug> ConnectionHandler: Received update session request with session 0x7
2024-05-16T07:00:23.8160803Z 2024.05.16 06:52:12.230812 [ 164 ] <Debug> KeeperDispatcher: Register session response callback 0x7
2024-05-16T07:00:23.8161271Z 2024.05.16 06:52:12.230877 [ 164 ] <Trace> KeeperDispatcher: Push new/update session request #0x7#-512#UpdateSession
2024-05-16T07:00:23.8161735Z 2024.05.16 06:52:12.230940 [ 156 ] <Trace> KeeperDispatcher: Skip to push #0x7#-512#UpdateSession to request processor
2024-05-16T07:00:23.8161958Z 2024.05.16 06:52:12.230965 [ 156 ] <Trace> KeeperDispatcher: Leader is 3
2024-05-16T07:00:23.8162366Z 2024.05.16 06:52:12.231011 [ 128 ] <Trace> ForwardConnection: Forward request #UpdateSession#7#100000 to endpoint node3:8102
2024-05-16T07:00:23.8162679Z 2024.05.16 06:52:12.231462 [ 128 ] <Trace> ForwardConnection: Try connect forward server node3:8102
2024-05-16T07:00:23.8162859Z 2024.05.16 06:52:12.254716 [ 161 ] <Trace> PollSet: Got 1 events
2024-05-16T07:00:23.8163147Z 2024.05.16 06:52:12.254756 [ 161 ] <Trace> SocketReactor: Dispatch event read for 0.0.0.0:8101
2024-05-16T07:00:23.8163453Z 2024.05.16 06:52:12.254792 [ 161 ] <Trace> SocketAcceptor: 0.0.0.0:8101 tries to accept connection
2024-05-16T07:00:23.8163748Z 2024.05.16 06:52:12.254835 [ 161 ] <Trace> SocketAcceptor: Successfully accepted 192.168.32.1:42498
2024-05-16T07:00:23.8164076Z 2024.05.16 06:52:12.254958 [ 161 ] <Information> ConnectionHandler: New connection from 192.168.32.1:42498
2024-05-16T07:00:23.8164297Z 2024.05.16 06:52:12.255002 [ 161 ] <Trace> PollSet: Try to wakeup poll set
2024-05-16T07:00:23.8164599Z 2024.05.16 06:52:12.255248 [ 165 ] <Trace> PollSet: Got 1 events
2024-05-16T07:00:23.8164917Z 2024.05.16 06:52:12.255346 [ 165 ] <Trace> SocketReactor: Dispatch event read for 192.168.32.3:8101
2024-05-16T07:00:23.8165225Z 2024.05.16 06:52:12.255404 [ 165 ] <Trace> ConnectionHandler: Peer 192.168.32.1:42498#0x0 is readable
2024-05-16T07:00:23.8165592Z 2024.05.16 06:52:12.255441 [ 165 ] <Trace> ConnectionHandler: Peer 192.168.32.1:42498#0x0 read request length : 45
2024-05-16T07:00:23.8166082Z 2024.05.16 06:52:12.255498 [ 165 ] <Trace> ConnectionHandler: Peer 192.168.32.1:42498#0x0 read request done, body length : 45, req_body_buf used 45
2024-05-16T07:00:23.8166439Z 2024.05.16 06:52:12.255540 [ 165 ] <Debug> ConnectionHandler: Received new session request with internal id 0x7
2024-05-16T07:00:23.8166744Z 2024.05.16 06:52:12.255584 [ 165 ] <Debug> KeeperDispatcher: Register session response callback 0x7
2024-05-16T07:00:23.8167242Z 2024.05.16 06:52:12.255731 [ 165 ] <Fatal> : Logical error: 'Session response callback with id 0x7 has already registered'.
2024-05-16T07:00:23.8167453Z 2024.05.16 06:52:12.255984 [ 122 ] <Trace> BaseDaemon: Received signal 6
2024-05-16T07:00:23.8167725Z 2024.05.16 06:52:12.256340 [ 172 ] <Fatal> BaseDaemon: ########################################
2024-05-16T07:00:23.8168386Z 2024.05.16 06:52:12.256422 [ 172 ] <Fatal> BaseDaemon: (version v2.0.5, build id: 7762AAB115DF0903) (from thread 165) Received signal Aborted (6)
2024-05-16T07:00:23.8168535Z 2024.05.16 06:52:12.256484 [ 172 ] <Fatal> BaseDaemon:
I will fix it in another PR.
Originally posted by @JackyWoo in https://github.com/JDRaftKeeper/RaftKeeper/issues/262#issuecomment-2116975781