adlfs
adlfs copied to clipboard
Xarray Serialisation Issues reading NetCDF from AzureBlobFile
Trying to read a NetCDF file in xarray and running into serialisation issues.
AzureBlobFile object contains a SimpleQueue, which is non trivial to serialise. Suspect that fsspec should be handling the serialisation differently.
Simple Reproducer:
from distributed.protocol import serialize, ToPickle
storage_options = {'connection_string':***, 'account_key': ***}
fs = fsspec.filesystem('abfs',**storage_options)
url = "<CONTAINER_NAME>"
files = fs.ls(url)
ds = xr.open_dataset(
fs.open(files[0], 'rb'),
chunks={'x': 2000, 'y': 2000},
engine='h5netcdf',
)
serialize(ToPickle(list(ds.variables.values())[0]._data.dask))
Can you post the full traceback? What object has a reference to the queue?
2024-06-13 12:48:57,917 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 2 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x31490b130>
0. original-open_dataset-FSC-2bd87bcfc4ee55630c36125387cfd518
1. open_dataset-FSC-2bd87bcfc4ee55630c36125387cfd518
>.
Traceback (most recent call last):
File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
TypeError: cannot pickle 'weakref.ReferenceType' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
TypeError: cannot pickle 'weakref.ReferenceType' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/Users/arakowski/miniconda3/envs/pytorch-coiled/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
TypeError: cannot pickle 'weakref.ReferenceType' object
the 'weakref.ReferenceType' object will sometimes show as SimpleQueue when doing something more realistic with the dataset than shown in simple reproducer.
Thanks. We'll need to figure out which attributes of which objects aren't picklable. Some of these (like things from azure.storage.blob or azure.identity) might need to be pushed upstream. Others might need to be fixed here. Any research you can do here would be helpful.