ratis icon indicating copy to clipboard operation
ratis copied to clipboard

RATIS-1519. When DataStreamManagement#read an exception occurs, remove DataStream

Open guohao-rosicky opened this issue 3 years ago • 9 comments

What changes were proposed in this pull request?

When DataStreamManagement#read an exception occurs, remove DataStream

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-1519

guohao-rosicky avatar Feb 07 '22 06:02 guohao-rosicky

@szetszwo Please take a look.

guohao-rosicky avatar Feb 07 '22 06:02 guohao-rosicky

@szetszwo modified. I removed streams.remove () from readImpl, Please take a look

guohao-rosicky avatar Feb 07 '22 10:02 guohao-rosicky

@guohao-rosicky , there are some NPEs in https://github.com/apache/ratis/runs/5103000516?check_suite_focus=true . Please take a look. We may have to update the tests?

szetszwo avatar Feb 08 '22 03:02 szetszwo

@guohao-rosicky , from the debug message you added, it is a Mock for RaftServer. It also showed "DataStream null". that We probably need to update the test.

2022-02-09 13:43:51,230 [Time-limited test] INFO  datastream.DataStreamTestUtils 
(DataStreamTestUtils.java:writeAndCloseAndAssertReplies(315)) - 
test header: RaftClientRequest:client-992CD7CB0548->s0@group-9E34F31FA0DB, cid=0, seq=0, 
DataStream, null, 
server: Mock for RaftServer, hashCode: 1696083569, bytesWritten: 7452308, stepDownLeader: false

szetszwo avatar Feb 09 '22 05:02 szetszwo

@szetszwo

in org.apache.ratis.datastream.DataStreamTestUtils#assertHeader stateMachine.getSingleDataStream in null, I found that it will be added at stateMachine.stream, I wonder why not get it.

Can you help me find it.

guohao-rosicky avatar Feb 10 '22 06:02 guohao-rosicky

@guohao-rosicky , It seems the server behavior was changed. The primary did not forward the request to the next server as shown below that s1 returns null; see https://issues.apache.org/jira/secure/attachment/13039878/596_debug.patch

2022-02-10 20:37:26,654 [Time-limited test] INFO  datastream.DataStreamTestUtils (DataStreamTestUtils.java:writeAndCloseAndAssertReplies(314))
 - start Stream0
2022-02-10 20:37:26,683 [nioEventLoopGroup-3-1] INFO  statemachine.StateMachine (DataStreamTestUtils.java:stream(151))
 - XXX MultiDataStreamStateMachine:s0:group-56A12F14DED8 put 0@client-E43970695D43, SingleDataStream:
 writeRequest=RaftClientRequest:client-E43970695D43->s0@group-56A12F14DED8, cid=0, seq=0, DataStream, Message:<EMPTY>, logEntry=null
2022-02-10 20:37:26,736 [Time-limited test] INFO  datastream.DataStreamTestUtils (DataStreamTestUtils.java:assertHeader(331))
 - XXX s0: dataSize=7478599, stepDownLeader=false, header=RaftClientRequest:client-E43970695D43->s0@group-56A12F14DED8, cid=0, seq=0, DataStream, null
2022-02-10 20:37:26,736 [Time-limited test] INFO  statemachine.StateMachine (DataStreamTestUtils.java:getSingleDataStream(181))
 - XXX MultiDataStreamStateMachine:s0:group-56A12F14DED8: get 0@client-E43970695D43 return SingleDataStream:
 writeRequest=RaftClientRequest:client-E43970695D43->s0@group-56A12F14DED8, cid=0, seq=0, DataStream, Message:<EMPTY>, logEntry=null
2022-02-10 20:37:26,737 [Time-limited test] INFO  datastream.DataStreamTestUtils (DataStreamTestUtils.java:assertHeader(331))
 - XXX s1: dataSize=7478599, stepDownLeader=false, header=RaftClientRequest:client-E43970695D43->s0@group-56A12F14DED8, cid=0, seq=0, DataStream, null
2022-02-10 20:37:26,737 [Time-limited test] INFO  statemachine.StateMachine (DataStreamTestUtils.java:getSingleDataStream(181))
 - XXX MultiDataStreamStateMachine:s1:group-56A12F14DED8: get 0@client-E43970695D43 return null

Please check if the new behavior is correct. If yes, we can update the test.

szetszwo avatar Feb 10 '22 12:02 szetszwo

@szetszwo I apply the patch, I still have an NPE, I'll keep tracking the issues

guohao-rosicky avatar Feb 16 '22 02:02 guohao-rosicky

... I apply the patch, I still have an NPE, ...

@guohao-rosicky , Do you mean https://issues.apache.org/jira/secure/attachment/13039878/596_debug.patch ? It just prints out some dubug messages. Please check if the new behavior is correct. If yes, we can update the test in order to avoid the NPE.

szetszwo avatar Feb 16 '22 04:02 szetszwo

@guohao-rosicky , there are some conflicts. Could you fix them?

szetszwo avatar Nov 02 '23 16:11 szetszwo

@guohao-rosicky , thanks for working not this! On top of the bug fix, the change from RaftServer to Division is great!

szetszwo avatar Apr 08 '24 18:04 szetszwo