ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-13963. Atomic Create-If-Not-Exists

Open peterxcli opened this issue 2 months ago • 5 comments

What changes were proposed in this pull request?

Extends the expectedDataGeneration logic in Ozone Manager to support atomic "create-if-not-exists" semantics.

*   Enables passing -1 as the expectedDataGeneration. *   When set to -1, the validateAtomicRewrite logic (in both Create and Commit phases) strictly enforces that the target key must not exist, throwing KEY_ALREADY_EXISTS otherwise. *   This establishes the core OM support required for conditional "If-None-Match" requests, allowing upper layers (like S3 Gateway) to implement these features with minimal changes to the underlying protocol.

for how S3 Put with If-None-Match header request can leverage this, see below flow:

  1. S3 Gateway Layer
    1. Parse If-None-Match: *.
    2. Set expectedDataGeneration = -1.
    3. Pass to RpcClient.rewriteKey().
  2. OM Create Phase
    1. Validate expectedDataGeneration == -1.
    2. If key exists → throw KEY_ALREADY_EXISTS.
    3. Store -1 in open key metadata.
  3. OM Commit Phase
    1. Check expectedDataGeneration == -1 from open key.
    2. If key now exists (race condition) → throw KEY_ALREADY_EXISTS.
    3. Commit key.

Race Condition Handling: Using -1 ensures atomicity. If a concurrent write (Client B) commits between Client A's Create and Commit, Client A's commit fails the -1 validation check (key now exists), preserving strict create-if-not-exists semantics.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13963

How was this patch tested?

TODO

peterxcli avatar Nov 20 '25 04:11 peterxcli

@ivandika3 Could you take a look at whether this API change makes sense? Since rewrite is intended for existing keys, allowing a semantic like “don’t create if existed” can be confusing. On the other hand, adding another flag to the create-key request in the proto feels redundant. Do you have a better suggestion?

peterxcli avatar Nov 20 '25 04:11 peterxcli

Thanks @peterxcli for the patch, I'll take a look when I have time.

Involving @sodonnel since he's the original creator of Atomic rewrite.

ivandika3 avatar Nov 20 '25 04:11 ivandika3

Could you take a look at whether this API change makes sense? Since rewrite is intended for existing keys, allowing a semantic like “don’t create if existed” can be confusing. On the other hand, adding another flag to the create-key request in the proto feels redundant. Do you have a better suggestion?

I think this depends whether atomic rewrite can be cleanly reused for S3 conditional requests. However after another look, I think adding new KeyArgs optional attributes (e.g. allowOverwrite) might be better since atomic rewrite use case depends on the update ID only while S3 conditional requests will need to check ETag (which might need another KeyArgs attributes).

FYI, The concept of "generation" was loosely taken from GCP (https://docs.cloud.google.com/storage/docs/request-preconditions) which supports both request preconditions based on generation (GCP specific) and based on ETag (S3 compatible.

ivandika3 avatar Nov 20 '25 05:11 ivandika3

whether atomic rewrite can be cleanly reused for S3 conditional requests.

@ivandika3 I opened https://github.com/apache/ozone/pull/9334 and included a design doc.

Main idea:

  • For If non-match header: issue a rewrite-key request with expectDataGeneration = -1 to provide “CREATE IF NOT EXISTS” semantics.
  • For If match header: fetch key info from OM, validate the ETag at S3G, then set expectDataGeneration to the fetched version. This lets S3G perform optimized concurrency control by leveraging OM’s expectDataGeneration support during the rewrite.

peterxcli avatar Nov 20 '25 07:11 peterxcli

Thanks @peterxcli, left some comments in https://github.com/apache/ozone/pull/9334, let's discuss the design there.

ivandika3 avatar Nov 25 '25 05:11 ivandika3

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions[bot] avatar Dec 25 '25 00:12 github-actions[bot]