anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Allow dispatched write_elem without clearing Zarr store root

Open keller-mark opened this issue 1 year ago • 1 comments

Please describe your wishes and possible alternatives to achieve the desired result.

Hi, I have a use case in which I am using write_dispatched to perform partial writes of an AnnData object (i.e., I want to add new arrays at locations like adata.layers["counts"]) without having all existing keys loaded into memory.

The current write_elem implementation executes store.clear() at the root of the store

https://github.com/scverse/anndata/blob/8e9eb882ddbef3fb2043a93d6d0553813dd2bc2b/src/anndata/_io/specs/registry.py#L348

which I do not want to occur because I want to keep the existing store contents and only add to the existing on-disk data structure.

Can this store root clearing be made optional / possible to override when using write_dispatched and write_elem?

Possible alternatives

Currently I am doing this

z = zarr.open(out_path, mode=mode)

# Monkey patch the clear method to prevent clearing the root group
old_clear = z.clear
z.clear = (lambda: None) # Do not allow clearing the root group

old_delitem = z.__class__.__delitem__
def patched_delitem(self, item):
    print(f"Attepting to delete {item}")
    if item == "/layers" or item == "/obsm":
        pass
    else:
        old_delitem(self, item)
z.__class__.__delitem__ = patched_delitem

write_dispatched(z, "/", adata, callback=write_chunked)
# Restore (though not really necessary)
z.clear = old_clear
z.__class__.__delitem__ = old_delitem

keller-mark avatar Oct 02 '24 19:10 keller-mark

From slack, write_dispatched should not be used for overriding stores. The following code does allow for what is sought:

from anndata.io import write_elem, read_elem
import numpy as np
from anndata.tests.helpers import gen_adata, assert_equal

adata = gen_adata((3, 2))
write_elem(store, "/", adata)
layer = np.random.randn(3, 2)
adata.layers["array"] = layer
write_elem(store["layers"], "array", layer)

assert_equal(read_elem(store), adata)

ilan-gold avatar Oct 04 '24 14:10 ilan-gold