anndata
anndata copied to clipboard
Allow dispatched write_elem without clearing Zarr store root
Please describe your wishes and possible alternatives to achieve the desired result.
Hi,
I have a use case in which I am using write_dispatched to perform partial writes of an AnnData object (i.e., I want to add new arrays at locations like adata.layers["counts"]) without having all existing keys loaded into memory.
The current write_elem implementation executes store.clear() at the root of the store
https://github.com/scverse/anndata/blob/8e9eb882ddbef3fb2043a93d6d0553813dd2bc2b/src/anndata/_io/specs/registry.py#L348
which I do not want to occur because I want to keep the existing store contents and only add to the existing on-disk data structure.
Can this store root clearing be made optional / possible to override when using write_dispatched and write_elem?
Possible alternatives
Currently I am doing this
z = zarr.open(out_path, mode=mode)
# Monkey patch the clear method to prevent clearing the root group
old_clear = z.clear
z.clear = (lambda: None) # Do not allow clearing the root group
old_delitem = z.__class__.__delitem__
def patched_delitem(self, item):
print(f"Attepting to delete {item}")
if item == "/layers" or item == "/obsm":
pass
else:
old_delitem(self, item)
z.__class__.__delitem__ = patched_delitem
write_dispatched(z, "/", adata, callback=write_chunked)
# Restore (though not really necessary)
z.clear = old_clear
z.__class__.__delitem__ = old_delitem
From slack, write_dispatched should not be used for overriding stores. The following code does allow for what is sought:
from anndata.io import write_elem, read_elem
import numpy as np
from anndata.tests.helpers import gen_adata, assert_equal
adata = gen_adata((3, 2))
write_elem(store, "/", adata)
layer = np.random.randn(3, 2)
adata.layers["array"] = layer
write_elem(store["layers"], "array", layer)
assert_equal(read_elem(store), adata)