filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

API for conditional / exclusive write

Open TomAugspurger opened this issue 1 year ago • 3 comments

Over in https://github.com/zarr-developers/zarr-python/pull/2262, we'd like to write a file but only if it doesn't already exist. On a local file system, this would be open(path, mode="xb"), which will fail with a FileExistsError if the file already exists.

Now that S3 supports conditional writes, it should be possible to implement this for s3fs, gcsfs (if_generation_match=0), and adlfs (overwrite=False).

Would there be any appetite for standardizing this behavior? I'm not sure what API is best, but I lean towards something like an overwrite: bool parameter to pipe and similar methods. We could also try to support mode=xb in some open-like methods, but I'm less sure about that.

TomAugspurger avatar Sep 27 '24 01:09 TomAugspurger

If this is only to apply to open, then the mode= would be fine, and probably the check would happen at open time. But I think you mean for methods put/pipe, right? A bool argument on those methods and their one-file variants would be enough.

A couple of thoughts:

  • how does this interact with on_error, when trying to write multiple files to remote; is it like any other IO error? Probably yes; so other files would get written (concurrently), this would not act as a lock on the whole operation
  • on S3, I assume you are looking at If-None-Match; is if_generation_match=0 really the same, or does it mean "if no such filename ever existed"?

martindurant avatar Sep 29 '24 12:09 martindurant

Do you know how this interacts with multi-part-uploads, where although many bytes might have been sent, the file is not really written to the remote path location until a final commit? At what point is the exists condition applied?

martindurant avatar Oct 01 '24 14:10 martindurant

I'm not sure offhand.

TomAugspurger avatar Oct 02 '24 19:10 TomAugspurger