ozone
ozone copied to clipboard
HDDS-6848. Ignore timeout replication tasks on datanode
What changes were proposed in this pull request?
If a datanode crashes, the blocks on the datanode will be added to inflightReplication to fulfill the replication factor.
When the datanode is restarted, the containers don't need to be replicated anymore, but the replication commands have been sent to datanodes. If the crashed datanodes holds many containers, there will be many replication commands which will cause jams on Datanodes's replication queue, and the big replication traffic will also cause problems to the whole cluster.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6848
How was this patch tested?
unit test
@adoroszlai @sodonnel @ferhui Could you help to review this PR?
There is some work going on to fix this sort of issue in different ways. #3482 is going to limit the number of replications that can be inflight at any given times, and hence should avoid deep queues on the DNs.
As part of EC, we are working toward replacing replication manager completely, and when we do that, we will be holding back the replication work in SCM and feeding it to the datanodes when they have capacity to take it. The queues on the DNs should be very small when that happens.
While the approach here looks OK, I feel we should hold off on committing it until we see how well #3482 works, and larger RM refactor. Once those are in, this change would not be needed I think.
@sodonnel Thanks for the review. Since it's causing severe issues in our cluster, I think we might need to use this version of fix internally first.
Talking about #3482 , I'm not sure if they are solving the same problem. If a task is marked timeout in RM, but still processed by datanode, the work done by datanode side is totally wasted. I think only if RM can handle timeout tasks better can this problem be mitigated.
@symious I was discussing some replication related topics today and remembered about this change. I think we should move ahead with this change, and also a similar change for deleteContainer commands, but we can do it in a new PR once we have this one done. I will give this a quick review and then perhaps you could try to fix the conflicts if you still have time to work on this?
I think I have a way to do this somewhat generically for all commands, so please hold off on working on this. I will try to post a PR tomorrow.
I think I have a way to do this somewhat generically for all commands, so please hold off on working on this. I will try to post a PR tomorrow.
Sure.