ratis
ratis copied to clipboard
RATIS-1524. Optional DataStreamManagement#startTransaction configuration
What changes were proposed in this pull request?
Optional DataStreamManagement#startTransaction configuration
What is the link to the Apache JIRA
see:
https://issues.apache.org/jira/browse/RATIS-1524
https://issues.apache.org/jira/browse/RATIS-1513
@szetszwo What do you think of this change?
@szetszwo I want to discuss a question about ozone BCSID.
Does BCSID still make sense when you use stream to transfer data and put A Block into the stream process? Because raft transport is deprecated, playback of raft log is eliminated.
I can open a JIRA in Ozone and discuss this
@guohao-rosicky , without BCSID and the Ratis log, the current Ozone design won't work for recovery. Why we need Ratis (or Raft) in Ozone? It is because Ratis provides a consistent view of the data among the servers. Without Ratis, the data in the servers may diverge. How could you tell which one to trust? And how to do data recovery?
Thank you @szetszwo , I have understood BCSID.
In one of my test versions of ozone, I generated raft logs by sending raft async RPC through DataStreamManagement#startTransaction to get the BCSID with the raft log index of ozone, Throughput is very small because raft needs to sort.
In another version of my test, if DataStreamManagement#startTransaction throughput was doubled by skipping it, could we get a BCSID in another way than raft log?
... could we get a BCSID in another way than raft log?
We need a BCSID and also the ability to commit transactions. I guess there are no easy ways. Otherwise, we can use it to replace the Raft Consensus Algorithm in general.
... could we get a BCSID in another way than raft log?
We need a BCSID and also the ability to commit transactions. I guess there are no easy ways. Otherwise, we can use it to replace the Raft Consensus Algorithm in general.
package org.apache.ratis.io;
public enum StandardWriteOption implements WriteOption {
/** Sync the data to the underlying storage. */
SYNC,
/** Close the data to the underlying storage. */
CLOSE,
/** Returns a unique ID **/
UNIQUE_ID,
}
@szetszwo Can we add a new WriteOption? When the primary node receives this request, it generates a unique ID of type long, synchronizes it to the other nodes, and returns it to the client
In other words, generating an ID on the Primary node and passing it to the other nodes as a stream can improve throughput without raft requests internally.
..., generating an ID on the Primary node ...
How to make sure the ID is unique? All the nodes could be the Primary node at some point of time.
I can open a JIRA in Ratis and discuss this @szetszwo @captainzmc
https://issues.apache.org/jira/browse/RATIS-1513
https://issues.apache.org/jira/browse/RATIS-1513
@guohao-rosicky, @captainzmc, I really hope that we could fix "TimeoutIOException: Timeout 3000ms". How about we fix it first?
https://issues.apache.org/jira/browse/RATIS-1513
@guohao-rosicky, @captainzmc, I really hope that we could fix "TimeoutIOException: Timeout 3000ms". How about we fix it first?
We changed this configuration to fix this problem because DataStreamManagement#startTransaction was taking too long.
We have a test report showing that DataStreamManagement#startTransaction caused "TimeoutIOException: Timeout 3000ms".
@szetszwo I was hoping you could help us come up with a solution.
The solution I can think of so far is to skip DataStreamManagement#startTransaction that triggers the timeout
@szetszwo https://docs.google.com/document/d/1mS3GqovQ3D1b7V0L3--VF9xhl5jdId1mSL0cQNb7uHo/edit
This is the process of testing reports and locating problems
We changed this configuration to fix this problem because DataStreamManagement#startTransaction was taking too long.
This is not really a fix since it changes the functionality.
commit f349fdf2488e254d0fa2eeb17bbaf44de2b0932d
Author: hao guo <[email protected]>
Date: Wed Dec 15 10:29:13 2021 +0800
RATIS-1438. Add request timeout to ratis Streaming (#563)
"TimeoutIOException: Timeout 3000ms" probably started happening after RATIS-1438. How about we increase the timeout value, say to 10 seconds?
We changed this configuration to fix this problem because DataStreamManagement#startTransaction was taking too long.
This is not really a fix since it changes the functionality.
commit f349fdf2488e254d0fa2eeb17bbaf44de2b0932d Author: hao guo <[email protected]> Date: Wed Dec 15 10:29:13 2021 +0800 RATIS-1438. Add request timeout to ratis Streaming (#563)
"TimeoutIOException: Timeout 3000ms" probably started happening after RATIS-1438. How about we increase the timeout value, say to 10 seconds?
DataStreamManagement#startTransaction
It can be changed to 10 seconds and I will submit a new PR for this
@szetszwo This is the process of testing reports and locating problems https://docs.google.com/document/d/1mS3GqovQ3D1b7V0L3--VF9xhl5jdId1mSL0cQNb7uHo/edit
Can you take a look at our test report and consider further optimizing the performance of ratis stream over ozone based on the problems identified in the test report.
Discuss how to optimize the scheme.
@captainzmc and I can participate in the development of optimization