ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-6483. Change delete key on OMKey(s)Delete

Open kuenishi opened this issue 2 years ago • 3 comments

What changes were proposed in this pull request?

This patch changes:

  • The key format of delete table from full key to <timestamp-in-hex>-<transactionIndex-in-hex>
  • OzoneClient protocol of delete key & delete keys, adding timestamp to make the request timestamp deterministic (As the timestamp is added in preExecute(), so no change in client is needed)

While there is no need to change deletion service, this change resolves two existing issues. (1) The race condition of object deletion and deletion service, which may lead to blocks leaking (key infos deleted in OM while blocks remain in SCM), and (2) the order of keys and blocks to be deleted, from being deleted in alphabetical ordering of full keys, to being deleted in numerical ordering of timestamp-transactionIndex with which the keys deleted. That said, newly-deleted keys will be always deleted later, regardless of the key name.

To apply these changes to all APIs, the amount of work would become huge. So I split HDDS-5905 to several subtasks and this is the very first one, which fixes key format of APIs of deleting objects and files (There are a lot of cases where KeyInfos are inserted into the delete table). See my design doc for further details and updates since last discussion.

One of the caveats of this patch is that OmUtils.getKeyForDeleteTable() may not be practically fast, not because of retrieving timestamps, but formatting long values as strings. I do have benchmark numbers and a code snippet to fix it, but want to contribute as a separate JIRA ticket because the scope of this patch is already fairly large.

What is the link to the Apache JIRA

HDDS-6483 (Parent task: HDDS-5905)

How was this patch tested?

unit tests

kuenishi avatar Mar 23 '22 05:03 kuenishi

@errose28 Thank you for taking the review. I think now it's ready for review, as all tests has passed. If you want to discuss anything, please ping me in ASF Slack, mail, or community meeting on Thursday (Friday afternoon for me, though).

kuenishi avatar Apr 20 '22 07:04 kuenishi

Hey @kuenishi , I left one comment on JIAR HDDS-5905.

ChenSammi avatar Apr 24 '22 09:04 ChenSammi

Thanks. Responded in JIRA, too.

kuenishi avatar Apr 24 '22 22:04 kuenishi

@ChenSammi please respond to @kuenishi's latest comment on Jira

adoroszlai avatar Oct 26 '22 20:10 adoroszlai

@kuenishi please rebase this change, we will revisit the PR in the coming week.

kerneltime avatar Nov 22 '22 02:11 kerneltime

@kerneltime Yep, will do it. Thank you for heads up.

kuenishi avatar Nov 29 '22 09:11 kuenishi

@nandakumar131 Plz check, seems do not need much change for this issue fix

sumitagrawl avatar Nov 30 '22 12:11 sumitagrawl

As one of suggestion, we need a uniqueness for the deleteTable entry, i.e. unique key, so that it can be deleted easily, We can construct key using objectID as additional parameter in deleteTable, similar we are doing for bucketId and volumeId as part of Key for directory deletion.

So similar way, we can append objectId also as part of key to handle the scenario in key,

sumitagrawl avatar Dec 01 '22 07:12 sumitagrawl

I have force-pushed a rebased version. Please check it out.

@sumitagrawl Thank you for the suggestion. I think transaction log index would be enough to guarantee the uniqueness, and all deleted OmKeyInfo s in the transaction are included in the RepeatedOmKeyInfo object. There are some cases it is unclear which object id should be appended, such as deleting multiple open keys at once in OMOpenKeysDeleteRequest(WithFSO). In such cases, each OmKeyInfos have different object id.

kuenishi avatar Dec 01 '22 08:12 kuenishi

@kuenishi As solution in this PR, Every Request is having Key: timestamp+transactionId in deleted table and multiple keys are added to RepeatedOmKeyInfo. So one deletedKey have set of actual keys for deletion

Impact:

  1. keyLimitPerTask: In KeyDeletingService, this represents number of actual set of keys for deletion, now it will be set of request, which is further group of keys. This needs resolve.
  2. Same case required for DirectoriesPurgeRequest where its deleting keys for recursive directory
  3. I am not finding use case of adding timestamp, as transactionId is unique, I think should avoid duplicate request for same transactionId.

sumitagrawl avatar Dec 05 '22 07:12 sumitagrawl

@kuenishi Similar problem is present for Directory recursive, but can not follow above solution, This is because, key itself is having information of VolumeId and BucketId, but this is not present in value. This is required to identify further sub-directory and files (not the name, as bucket can be recreated and can be different for quota), And Need use VolumeId and BucketId. Ref: https://issues.apache.org/jira/browse/HDDS-7592

With above solution, we may loose VolumeId and BucketId which was present as part of key. Currently for Key, this is not used, its just get purged.

Please check how to avoid this information loose if required in future.

sumitagrawl avatar Dec 05 '22 16:12 sumitagrawl

keyLimitPerTask: In KeyDeletingService, this represents number of actual set of keys for deletion, now it will be set of request, which is further group of keys. This needs resolve. Same case required for DirectoriesPurgeRequest where its deleting keys for recursive directory

OK, I've get your idea of appending object Id to the key in delete table. I'll split them into multiple entries in the delete table.

I am not finding use case of adding timestamp, as transactionId is unique, I think should avoid duplicate request for same transactionId.

In case of non-HA setting, I saw a line of code comment saying that the transaction id is reset to 0 after the restart:

https://github.com/apache/ozone/blob/de42c614a88c9ca9a082c12625f3dff12508b648/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/WithObjectID.java#L101

I added a timestamp prefix to address this issue of transaction id being reset to 0 (and we have a non-HA OM, which will be updated to 1.3.0 soon). Isn't this the case any more?

kuenishi avatar Dec 06 '22 02:12 kuenishi