zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

fix file modes

Open brokkoli71 opened this issue 1 year ago • 5 comments

fixes https://github.com/zarr-developers/zarr-python/issues/1978 according to docstrings of zarr.open the modes should do:

mode : {'r', 'r+', 'a', 'w', 'w-'}, optional
        Persistence mode: 'r' means read only (must exist); 'r+' means
        read/write (must exist); 'a' means read/write (create if doesn't
        exist); 'w' means create (overwrite if exists); 'w-' means create
        (fail if exists).

this PR implements and tests this behavior, so https://github.com/zarr-developers/zarr-python/issues/1978 gets fixed

brokkoli71 avatar Jun 27 '24 13:06 brokkoli71

Thanks @brokkoli71 for picking this up. What is the expected behavior of mode='w'? Should we have a Store.clear() method that we call in these cases?

jhamman avatar Jul 01 '24 22:07 jhamman

Thanks @brokkoli71 for picking this up. What is the expected behavior of mode='w'? Should we have a Store.clear() method that we call in these cases?

good point, according to the docstrings: 'w' means create (overwrite if exists). So yes, would be probably a good idea to have a generalized Store.clear. I will have a look at that

brokkoli71 avatar Jul 02 '24 09:07 brokkoli71

I agree that a method for clearing a store is a good idea. Just bear in mind that this can be a performance bottleneck when deleting arrays with a lot of chunks, especially on remote storage. We should think about performance at some point.

d-v-b avatar Jul 02 '24 10:07 d-v-b

I agree that a method for clearing a store is a good idea. Just bear in mind that this can be a performance bottleneck when deleting arrays with a lot of chunks, especially on remote storage. We should think about performance at some point.

I was wondering, if supporting mode='w' as create (overwrite if exists) is a good idea concerning the performance? what is the usecase of this mode? maybe it would be better to explicitly have a delete or clear method? or we should have a warning in the docstrings?

brokkoli71 avatar Jul 03 '24 10:07 brokkoli71

I agree that a method for clearing a store is a good idea. Just bear in mind that this can be a performance bottleneck when deleting arrays with a lot of chunks, especially on remote storage. We should think about performance at some point.

I was wondering, if supporting mode='w' as create (overwrite if exists) is a good idea concerning the performance? what is the usecase of this mode? maybe it would be better to explicitly have a delete or clear method? or we should have a warning in the docstrings?

There's a lot of usage for create + overwrite -- if users are rapidly iterating on something, they already created an old array, but they want to save a new array with the same name + different datatype, then the easiest way to do this is via something like an overwrite option. Handling this via array the creation flow is more ergonomic than forcing users to go through a separate API.

Don't worry too much about the performance concerns with bulk deletion for now. We should make the API convenient, then worry about performance later. I only brought delete performance up because it's something that zarr python v2 didn't deal with, which caused me some headaches.

d-v-b avatar Jul 03 '24 10:07 d-v-b

@brokkoli71 - happy to merge this if you can resolve the conflicts.

jhamman avatar Jul 25 '24 19:07 jhamman

Sadly still conflicting, @brokkoli71

joshmoore avatar Jul 26 '24 15:07 joshmoore