dask-rasterio icon indicating copy to clipboard operation
dask-rasterio copied to clipboard

TypeError: self._hds cannot be converted to a Python object for pickling

Open arkanoid87 opened this issue 6 years ago • 3 comments

Seems that rasterio's _hds object is no more serializable

distributed.protocol.pickle - INFO - Failed to serialize ("('filled-2f9fe0560be0502eda038fa941309294', 0, 0)", <dask_rasterio.write.RasterioDataset object at 0x7f8f9deac828>, (slice(0, 748, None), slice(0, 22415, None)), <unlocked _thread.lock object at 0x7f8f9cb2af58>, False). Exception: self._hds cannot be converted to a Python object for pickling
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/miniconda3/envs/jupyter/lib/python3.6/site-packages/distributed/protocol/pickle.py in dumps(x)
     37     try:
---> 38         result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
     39         if len(result) < 1000:

~/miniconda3/envs/jupyter/lib/python3.6/site-packages/rasterio/_io.cpython-36m-x86_64-linux-gnu.so in rasterio._io.DatasetWriterBase.__reduce_cython__()

TypeError: self._hds cannot be converted to a Python object for pickling

arkanoid87 avatar Feb 01 '19 10:02 arkanoid87

Rasterio datasets can't be pickled and can't be shared between processes or threads. The work around is to distribute dataset identifiers (paths or URIs) and then open them in new threads. See https://github.com/mapbox/rasterio/issues/1731.

sgillies avatar Jul 24 '19 20:07 sgillies

@sgillies thanks for your input on this issue: https://github.com/corteva/rioxarray/pull/210

snowman2 avatar Jan 20 '21 14:01 snowman2

Intriguingly, the following code works.

def default_profile():
    return {
        "count": 1,
        "driver": "GTiff",
        "dtype": "float32",
        "nodata": -999999.0,
        "width": 100,
        "height": 100,
        "transform": rasterio.Affine(1.0, 0.0, 0.0, 0.0, 1.0, 0.0),
        "tiled": True,
        "interleave": "band",
        "compress": "lzw",
        "blockxsize": 256,
        "blockysize": 256,
    }

def read_dataset(dataset, window):
    dataset.read(window=window)

def write_dataset(dataset, pixels, window):
    dataset.write(pixels, window=window)

if __name__ == "__main__":
    mp.set_start_method("fork")
    window = rasterio.windows.Window(col_off=0, row_off=0, width=20, height=20)
    pixels = np.ones((1, 20, 20))
    default_profile = default_profile()

    with rasterio.open(Path("test_write.tiff"), mode="w", **default_profile) as dataset_write:
        with rasterio.open(Path("test_read.tiff"), mode="r") as dataset_read:

            p1 = mp.Process(target=read_dataset, args=(dataset_read, window))
            p2 = mp.Process(target=write_dataset, args=(dataset_write, pixels, window))

            p1.start()
            p2.start()
            p1.join()
            p2.join()

The output:

Read dataset successfully.
Write dataset successfully.

Am I missing something here? @sgillies

lionlai1989 avatar Apr 22 '22 08:04 lionlai1989