cf-python Is this a bug, or user error: NotImplementedError: Dataset is not picklable

Is this a bug, or user error: NotImplementedError: Dataset is not picklable

Open bnlawrence opened this issue 11 months ago • 4 comments

python
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cf
>>> cf.__version__
'3.16.1'

Attempt to use cf-python to read pp and write some netcdf. Code is:

import cf
import dask

dask.config.set(scheduler='processes',num_workers=12)

def convert(glob):
    ff = cf.read(glob)
    cf.write(ff,'all_year.nc',mode='w')

if __name__=="__main__":
   convert('*.pp')

Platform is jasmin sci6, data is N1280 pp output.

Error log here

Mar 12 '24 08:03 bnlawrence

Hi Bryan,

A bit of digging suggests that this is a bug (https://github.com/pydata/xarray/issues/1464 has the details). However, the writing is locked anyway (a netCDF4-python restriction), so there shouldn't be any benefit in this case from running on 12 workers.

If you remove the dask.config.set(...) line, I suspect that it will work.

I shall make the fix, though, so that your original code works doesn't fail.

Mar 12 '24 08:03 davidhassell

I shall make the fix, though, so that your original code works doesn't fail.

Looking into how xarray deals with this (which I haven't wholly understood, yet), it's probably not the 5 minute fix I dreamt of, but I'll keep at it ...

Mar 12 '24 10:03 davidhassell

(Sorry, I was hoping that I would get benefit from the workers on the read, since the pp bit is slow)

Mar 12 '24 10:03 bnlawrence

OK - we can read PP/FF files in parallel, so if you did (ff[0] + 2).array the reads would be parallised over Dask chunks, but writing is limited to one Dask chunk at a time, and a Dask chunk equates to one 2-d UM field, and so no benefit from parallelism in the writing case :(

Mar 12 '24 10:03 davidhassell

cf-python cf-python copied to clipboard

Is this a bug, or user error: NotImplementedError: Dataset is not picklable

cf-python
cf-python copied to clipboard