ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7126. Clean deletedBlock records of deleted containers

Open symious opened this issue 2 years ago • 9 comments

What changes were proposed in this pull request?

By default DeletedBlockLog fetches 20,000 records per iteration and send the deleteBlock command to datanodes.

There is a chance that the db of deletedTable has a lot of records of deleted containers, and these records won't be cleaned, which leads to the idle iteration of the fetches, thus causing SCM failed to send commands to datanodes.

This ticket is to clean the records of deleted containers to recover SCM's delete block operation.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7126

How was this patch tested?

unit test.

symious avatar Aug 15 '22 11:08 symious

Hi @symious, my understanding of the change here is that containers that have been deleted will not need individual blocks deleted anymore. However, the requirement for a container to be deleted is that all replicas report zero blocks/keys. See LegacyReplicationManager#deleteContainerReplicas. This means all block deletes for that replica should have been processed and removed from the log. Is there still a situation where deleted containers can end up in the DeletedBlockLog?

errose28 avatar Aug 15 '22 22:08 errose28

@errose28 Thanks for the review.

I see the container state is updated in LegacyReplicatioNmanager#deleteContaienrReplicas, but I didn't find the remove operation on records in DeletedTable, please correct me if I'm wrong.

symious avatar Aug 16 '22 00:08 symious

Right, I believe the reason there is no removal for deleted containers from being processed for block deletes is that a container cannot be deleted until all blocks from all its replicas are deleted. This should imply that the deleted block log has no more entries for that container.

  1. DeletedBlockLogImpl#commitTransactions checks that all replicas have processed a block deletion for a container before removing the entry.
  2. LegacyReplicationManager#deleteContainerReplicas checks that all replicas have deleted all blocks.

Since the container cannot be deleted until all blocks have been deleted, the deleted block log/table should not have any entries for the container by the time it is deleted. Do you see a case where this could fail and there are lingering delete block entries for a container which has already been deleted?

errose28 avatar Aug 16 '22 18:08 errose28

Haven't looked into why the inconsistence occurs, might it be related to the container report from Datanode?

Anyway, the issue is causing our Prod cluster stuck in deleting blocks, and the PR helps to resolve the issue.

symious avatar Aug 17 '22 12:08 symious

Sorry for the late response here. It's good to hear that this patch resolves the observed issue, but we should really try to understand why deleted containers are ending up in the deleted block log to make sure we have the correct fix. I would like to investigate this but have not had the time. @symious if you have time perhaps you could look in to how this case occurs as well, which will help us validate this fix.

errose28 avatar Aug 27 '22 00:08 errose28

@swamirishi will be looking at a way to reproduce this issue.

errose28 avatar Aug 31 '22 18:08 errose28

@errose28 @symious DeletedContainers can occur in this the list of deleteBlockTransactionsBlocks. Following is the flow when we try to delete blocks. Blocks from DeleteBlockTransaction are picked up and Blocks are marked for deletion and a DeleteCommandStatus response. So based on this response the SCM tries to commit the transaction DeleteBlockTransaction. The particular record from the table is removed only if min(# Replicas Deleted, Total Number of Replicas)>=Replication Factor(In case of Under Replication this won't be removed). When the blocks are deleted InMemoryBlockCounters are reduced in datanode side & these counters are sent along with the heartbeat. Hence on the SCM side LegacyReplicationManager checks if max(# of blocks) in closed container replica=0. If it is 0 the container is deleted. Since the block deletion & actual block deletion are asynchronous, this can cause the record to stay in the deleteBlockTransaction Table even if the containers have been deleted. The above patch fixes this issue.

swamirishi avatar Sep 23 '22 18:09 swamirishi

Thanks for the analysis @swamirishi. Coordinating between the deleted block log and replication manager will be challenging so I think removing DELETED containers from the deleted block log when they are encountered like this patch is doing should be a good fix. I will review this PR. @symious can you please resolve the merge conflict?

errose28 avatar Sep 23 '22 19:09 errose28

@errose28 @swamirishi Thank you for the review. PR updated.

symious avatar Sep 24 '22 23:09 symious