iris icon indicating copy to clipboard operation
iris copied to clipboard

Netcdf load & save Vs dask mp

Open bblay opened this issue 7 months ago • 1 comments

🐛 Bug Report

The dask multiprocessing scheduler seemingly doesn't work with iris. It raises TypeError: cannot pickle '_thread.lock' object.

How To Reproduce

import warnings
from pathlib import Path

import dask
import iris


def load_save(in_fpath, out_folder, scheduler):
    print(scheduler)
    out_folder.mkdir(parents=True, exist_ok=True)

    cubes = iris.load(in_fpath)
    results = []
    for cube in cubes:
        results.append(
            iris.save(cube, out_folder / f'{scheduler}_{len(results)}.nc', compute=False))

    dask.compute(results, scheduler=scheduler)


if __name__ == '__main__':
    warnings.filterwarnings(action='ignore', message='Ignoring a datum in netCDF load')

    in_fpath = Path('omitted.nc')
    out_folder = Path('out')

    # This works.
    load_save(in_fpath, out_folder, scheduler='threads')

    # This doesn't work: cannot pickle '_thread.lock' object
    # Forcing the cube to load its data before saving seems to make it work for smaller files.
    load_save(in_fpath, out_folder, scheduler='processes')

Environment

  • OS & Version: scitools/default on internal VDI
  • Iris Version: 3.7.0

bblay avatar Jul 11 '24 13:07 bblay