s3proxy icon indicating copy to clipboard operation
s3proxy copied to clipboard

Add native support for Put-If-Match

Open gaul opened this issue 2 months ago • 6 comments

jclouds does not allow expressing the correct header so S3Proxy emulates this via a HEAD then PUT instead of a single atomic HEAD-and-PUT. One option would be to create a BlobStore2 class that has this support, currently only for Azure via the SDK, then emulates the others via jclouds. References #732. CC @klaudworks.

gaul avatar Oct 07 '25 23:10 gaul

Is your suggestion to keep the jcloud based operations for all other operations apart from the conditional writes in PUT operations?

Before we add some conditional logic to use jcloud / azure sdk for individiual operations it might be a good idea to transition towards decomissioning jcloud now that it is no longer maintained. In that case, it might be a better first step to restructure the code to be compatible with the old jcloud implementation as well as new storage backends that completely bypass jcloud. In that case we could migrate the azure storage backend off jcloud as a first step.

klaudworks avatar Oct 08 '25 10:10 klaudworks

Either using conditional logic in handlePutBlob or a BlobStore2 subclass is preferred if anyone wants a fix in the near term. The latter could grow a putBlobConditional(String container, Blob blob, PutOptions options, String eTag) or create a PutOptions2 subclass that has an eTag field.

S3Proxy is basically a thin layer around Apache jclouds so I don't see how one would transition a single provider to avoid using jclouds without tremendous code duplication for every operation. The piecewise SDK rewriting of providers such as azureblob-sdk demonstrates is the only feasible path given the large task of removing all the domain objects like BlobStore and Blob that are likely to drag out for multiple years. If you have other suggestions I am happy to entertain them but specifically to get the benefit of native conditional Put-If-Match I recommend choosing one of the paths in the first paragraph.

gaul avatar Oct 09 '25 06:10 gaul

I skimmed through the code a little more and understood that getting rid of jcloud would basically require a complete rewrite at the current point in time. I will put some thought into that.

If we want to implement the atomic conditional put operations for azureblob-sdk I would go with your proposed PutOptions2 solution:

We create a PutOptions2 subclass of PutOptions with the additional fields ifMatch and ifNoneMatch which store the etag. Then, we create PutOptions2 in the s3ProxyHandler which we can pass through the whole chain of middlewares instead of PutOptions. Then, AzureBlobStore.putBlob(..., PutOptions options) can just check if PutOptions is an instance of PutOptions2 and if so extract the Etag from ifMatch or ifNoneMatch and forward it to the sdk.

As far as I currently understand it, creating a BlobStore2 would require changing each middleware while this approach would basically narrow down code changes to the providers that supports it i.e. a minor change in azureblob-sdk.

... okay there is a slight issue: We need to keep the non-atomic logic for all other providers in the S3Proxyhandler and redundantly handle the conditional write in the azureblob-sdk provider to make it atomic.

So I actually don't need the atomic write for my use case. I was just interested in contributing here. Therefore, I don't want to add unnecessary complexity for supporting something that may not even be used by anyone. I'd be willing to implement a solution but also open to just closing this.

klaudworks avatar Oct 10 '25 19:10 klaudworks

Agree that BlobStore2 is unnecssary and just subclassing a PutOptions2 is the more straightforward solution and should only be a small change. I think it is fine to add a small conditional in S3ProxyHandler but I don't think it will be necessary the jclouds code will just ignore the subclass and AzureBlobStore will just check to see if options is a PutOptions2 then cast.

I think the atomic replacement is what people expect otherwise they could do the HeadObject/PutObject themselves. The next provider to transition to a SDK will be S3 which will broaden the use case but I'm not sure when I will have time to do this myself. Realistically it would be a week-long project for me but I want to work on other things.

gaul avatar Oct 10 '25 20:10 gaul

Okay, I see two ways to implement the conditional write then:

  1. throw away the non-atomic conditional put, create subclass PutOptions2, check in AzureBlobStore for PutOptions2 and execute atomic conditional put. -> I don't completely agree with people doing HeadObject/PutObject themselves. I implemented the conditional put in the first place because some Kafka backup tooling used this feature when storing events in S3-compatible storage.
  2. keep non-atomic conditional put logic by doing HeadObject+PutObject and for Azure the PutObject will be atomic which makes the initial HeadObject redundant.

Given that the conditional put was just released it's likely that noone uses it yet. Therefore, I'd prefer option 1 especially if we add another s3-sdk provider soon. Let me know what you think and I can create a pull request.

Regarding the S3 SDK. I might be down to implement it. Let me think about it before I commit :-)

klaudworks avatar Oct 10 '25 22:10 klaudworks

Realistically 99% of all S3Proxy use cases are Azure, S3, and the transient/filesystem storage backends. So once all three support conditional PutObject then the emulation doesn't make much sense to support and it is probably better to raise an HTTP 501 for the other 1% of users. In the interim I could be convinced to remove the emulation if Azure adds native support. I'm not exactly sure what to do for the transient/filesystem but I guess they would have to emulate this non-atomically.

gaul avatar Oct 11 '25 22:10 gaul