iris
iris copied to clipboard
Netcdf load & save Vs dask mp
🐛 Bug Report
The dask multiprocessing scheduler seemingly doesn't work with iris.
It raises TypeError: cannot pickle '_thread.lock' object
.
How To Reproduce
import warnings
from pathlib import Path
import dask
import iris
def load_save(in_fpath, out_folder, scheduler):
print(scheduler)
out_folder.mkdir(parents=True, exist_ok=True)
cubes = iris.load(in_fpath)
results = []
for cube in cubes:
results.append(
iris.save(cube, out_folder / f'{scheduler}_{len(results)}.nc', compute=False))
dask.compute(results, scheduler=scheduler)
if __name__ == '__main__':
warnings.filterwarnings(action='ignore', message='Ignoring a datum in netCDF load')
in_fpath = Path('omitted.nc')
out_folder = Path('out')
# This works.
load_save(in_fpath, out_folder, scheduler='threads')
# This doesn't work: cannot pickle '_thread.lock' object
# Forcing the cube to load its data before saving seems to make it work for smaller files.
load_save(in_fpath, out_folder, scheduler='processes')
Environment
- OS & Version: scitools/default on internal VDI
- Iris Version: 3.7.0