anndata
anndata copied to clipboard
Distributed writing for H5ad format due to h5py objects being unserializable
Please make sure these conditions are met
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of anndata.
- [x] (optional) I have confirmed this bug exists on the master branch of anndata.
Report
This is the code that will fail.
import anndata as ad
import dask.array as da
import dask.distributed as dd
with dd.LocalCluster(n_workers=1,threads_per_worker=1) as cluster:
with dd.Client(cluster) as client:
adata = ad.AnnData(da.random.random((100, 100), chunks=(10, 10)))
adata.write_h5ad("test.h5ad")
Usually the same code used to fail for both zarr and h5ad, but this PR will fix the issue with zarr https://github.com/scverse/anndata/pull/1079. For h5ad serialization of h5py might be overcome by whatever Xarray does as mentioned in this issue https://github.com/pydata/xarray/issues/4242
Traceback:
023-08-25 11:17:10,491 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f6ae131c700>
0. 140097021523072
>.
Traceback (most recent call last):
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 29, in reducer_override
return deserialize, serialize(obj)
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/h5py.py", line 24, in serialize_h5py_dataset
header, _ = serialize_h5py_file(x.file)
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/distributed/protocol/h5py.py", line 11, in serialize_h5py_file
raise ValueError("Can only serialize read-only h5py files")
ValueError: Can only serialize read-only h5py files
During handling of the above exception, another exception occurred:
...
return Pickler.dump(self, obj)
File "/home/sel/mambaforge/envs/dask/lib/python3.9/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
Versions
-----
anndata 0.10.0.dev198+ga61d5d4
dask 2023.7.1
distributed 2023.7.1
numpy 1.22.4
pandas 2.0.0
scipy 1.9.3
session_info 1.0.0
zarr 2.13.3
-----
PIL 9.2.0
asciitree NA
asttokens NA
attr 23.1.0
awkward 2.1.0
awkward_cpp NA
backcall 0.2.0
bokeh 2.4.3
cffi 1.15.1
click 8.1.3
cloudpickle 2.2.0
colorama 0.4.6
comm 0.1.1
cython_runtime NA
cytoolz 0.12.0
...
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 15:55:03) [GCC 10.4.0]
Linux-6.1.44-1-MANJARO-x86_64-with-glibc2.38
-----
Session information updated at 2023-08-25 11:18