dask-rasterio
dask-rasterio copied to clipboard
TypeError: self._hds cannot be converted to a Python object for pickling
Seems that rasterio's _hds object is no more serializable
distributed.protocol.pickle - INFO - Failed to serialize ("('filled-2f9fe0560be0502eda038fa941309294', 0, 0)", <dask_rasterio.write.RasterioDataset object at 0x7f8f9deac828>, (slice(0, 748, None), slice(0, 22415, None)), <unlocked _thread.lock object at 0x7f8f9cb2af58>, False). Exception: self._hds cannot be converted to a Python object for pickling
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/miniconda3/envs/jupyter/lib/python3.6/site-packages/distributed/protocol/pickle.py in dumps(x)
37 try:
---> 38 result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
39 if len(result) < 1000:
~/miniconda3/envs/jupyter/lib/python3.6/site-packages/rasterio/_io.cpython-36m-x86_64-linux-gnu.so in rasterio._io.DatasetWriterBase.__reduce_cython__()
TypeError: self._hds cannot be converted to a Python object for pickling
Rasterio datasets can't be pickled and can't be shared between processes or threads. The work around is to distribute dataset identifiers (paths or URIs) and then open them in new threads. See https://github.com/mapbox/rasterio/issues/1731.
@sgillies thanks for your input on this issue: https://github.com/corteva/rioxarray/pull/210
Intriguingly, the following code works.
def default_profile():
return {
"count": 1,
"driver": "GTiff",
"dtype": "float32",
"nodata": -999999.0,
"width": 100,
"height": 100,
"transform": rasterio.Affine(1.0, 0.0, 0.0, 0.0, 1.0, 0.0),
"tiled": True,
"interleave": "band",
"compress": "lzw",
"blockxsize": 256,
"blockysize": 256,
}
def read_dataset(dataset, window):
dataset.read(window=window)
def write_dataset(dataset, pixels, window):
dataset.write(pixels, window=window)
if __name__ == "__main__":
mp.set_start_method("fork")
window = rasterio.windows.Window(col_off=0, row_off=0, width=20, height=20)
pixels = np.ones((1, 20, 20))
default_profile = default_profile()
with rasterio.open(Path("test_write.tiff"), mode="w", **default_profile) as dataset_write:
with rasterio.open(Path("test_read.tiff"), mode="r") as dataset_read:
p1 = mp.Process(target=read_dataset, args=(dataset_read, window))
p2 = mp.Process(target=write_dataset, args=(dataset_write, pixels, window))
p1.start()
p2.start()
p1.join()
p2.join()
The output:
Read dataset successfully.
Write dataset successfully.
Am I missing something here? @sgillies