cf-python
cf-python copied to clipboard
Is this a bug, or user error: NotImplementedError: Dataset is not picklable
python
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cf
>>> cf.__version__
'3.16.1'
Attempt to use cf-python to read pp and write some netcdf. Code is:
import cf
import dask
dask.config.set(scheduler='processes',num_workers=12)
def convert(glob):
ff = cf.read(glob)
cf.write(ff,'all_year.nc',mode='w')
if __name__=="__main__":
convert('*.pp')
Platform is jasmin sci6, data is N1280 pp output.
Error log here
Hi Bryan,
A bit of digging suggests that this is a bug (https://github.com/pydata/xarray/issues/1464 has the details). However, the writing is locked anyway (a netCDF4-python restriction), so there shouldn't be any benefit in this case from running on 12 workers.
If you remove the dask.config.set(...)
line, I suspect that it will work.
I shall make the fix, though, so that your original code works doesn't fail.
I shall make the fix, though, so that your original code works doesn't fail.
Looking into how xarray
deals with this (which I haven't wholly understood, yet), it's probably not the 5 minute fix I dreamt of, but I'll keep at it ...
(Sorry, I was hoping that I would get benefit from the workers on the read, since the pp bit is slow)
OK - we can read PP/FF files in parallel, so if you did (ff[0] + 2).array
the reads would be parallised over Dask chunks, but writing is limited to one Dask chunk at a time, and a Dask chunk equates to one 2-d UM field, and so no benefit from parallelism in the writing case :(