cfgrib
cfgrib copied to clipboard
If opening with xarray open_mfdataset and parallel=True it will fail unless you have previously opened it with parallel=False
Minimal repro:
import xarray as xr
ds = xr.open_mfdataset('gfs.0p25.201511*00.f0*.grib2', engine='cfgrib', combine='nested', concat_dim=['step'], parallel=True, chunks=24, backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface'}, 'indexpath': ''})
Expected result: returns xarray Actual result:
ECCODES ERROR : grib_handle_create: cannot create handle, no definitions found
ecCodes assertion failed: `h' in /home/conda/feedstock_root/build_artifacts/eccodes_1570714279314/work/src/grib_query.c:529
Note if in the same session/kernel you have previously opened with parallel=False
the above will pass. The repro needs to happen in a new session. This was executed on a local dask cluster.
I confirm this bug report with a different dataset and different error messages.
With parallel=False
open_mfdataset
always work:
>>> import cfgrib
>>> import xarray as xr
>>> print(xr.__version__, cfgrib.__version__)
0.13.0 0.9.7.4.dev0
>>> ds = xr.open_mfdataset('step*.grib', engine='cfgrib', concat_dim=['step'], combine='nested', parallel=False)
>>> ds
<xarray.Dataset>
Dimensions: (latitude: 1801, longitude: 3600, step: 3)
Coordinates:
time datetime64[ns] 2019-04-01
number int64 0
surface int64 0
* latitude (latitude) float64 90.0 89.9 89.8 89.7 ... -89.8 -89.9 -90.0
* longitude (longitude) float64 0.0 0.1 0.2 0.3 ... 359.6 359.7 359.8 359.9
* step (step) timedelta64[ns] 01:00:00 02:00:00 03:00:00
valid_time (step) datetime64[ns] 2019-04-01T01:00:00 ... 2019-04-01T03:00:00
Data variables:
t2m (step, latitude, longitude) float32 dask.array<chunksize=(1, 1801, 3600), meta=np.ndarray>
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2019-11-11T19:19:05 GRIB to CDM+CF via cfgrib-0....
Restarting the kernel and running with parallel=True
always crashes python inside ecCodes but it returns a few different error messages. I observed at leat:
ECCODES ERROR : Unable to find boot.def. Context path=/Users/amici/.conda/envs/ECM/share/eccodes/definitions
Possible causes:
- The software is not correctly installed
- The environment variable ECCODES_DEFINITION_PATH is defined but incorrect
ecCodes assertion failed: `0' in /usr/local/miniconda/conda-bld/eccodes_1566402639979/work/src/grib_context.c:205
ECCODES ERROR : grib_handle_create: cannot create handle, no definitions found
ecCodes assertion failed: `h' in /usr/local/miniconda/conda-bld/eccodes_1566402639979/work/src/grib_query.c:458
ECCODES ERROR : grib_parser: syntax error at line 34 of /Users/amici/.conda/envs/ECM/share/eccodes/definitions/boot.def
ECCODES ERROR : ecCodes Version: 2.13.1
and
ECCODES ERROR : ecCodes Version: 2.13.1
ecCodes Version: 2.13.1
Definition files path: /Users/amici/.conda/envs/ECM/share/eccodes/definitions
ECCODES ERROR : grib_parser_include: Could not resolve 'ECCODES_USE_' (included in /Users/amici/.conda/envs/ECM/share/eccodes/definitions/boot.def)
ecCodes assertion failed: `0' in /usr/local/miniconda/conda-bld/eccodes_1566402639979/work/src/grib_context.c:205
It looks like a locking/threading problem, @shahramn do you have any hint?
Any update on this @shahramn @alexamici or some kind of idea how deep the problem goes? I just updated cfgrib, eccodes, python-eccodes, dask and xarray through conda-forge and retried above minimal code with same issue:
>>> import cfgrib
>>> import xarray as xr
>>> import eccodes
>>> import dask
>>> print(cfgrib.___version__, xr.__version__, eccodes.__version__, dask.__version__)
0.9.8.1 0.15.1 2.17.0 2.14.0
>>> ds = xr.open_mfdataset('icon-eu-eps_europe_icosahedral_single-level_2019121918_*_t_2m.grib2',
engine='cfgrib', combine='nested', concat_dim=['step'], parallel=True,
backend_kwargs={'indexpath': ''})
ECCODES ERROR : grib_handle_create: cannot create handle, no definitions found
ecCodes assertion failed: `h' in /home/conda/feedstock_root/build_artifacts/eccodes_1583917083369/work/src/grib_query.c:568
Aborted (core dumped)
Minimal repro:
import xarray as xr ds = xr.open_mfdataset('gfs.0p25.201511*00.f0*.grib2', engine='cfgrib', combine='nested', concat_dim=['step'], parallel=True, chunks=24, backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface'}, 'indexpath': ''})
Expected result: returns xarray Actual result:
ECCODES ERROR : grib_handle_create: cannot create handle, no definitions found ecCodes assertion failed: `h' in /home/conda/feedstock_root/build_artifacts/eccodes_1570714279314/work/src/grib_query.c:529
Note if in the same session/kernel you have previously opened with
parallel=False
the above will pass. The repro needs to happen in a new session. This was executed on a local dask cluster.
I am also reproducing this error, while using:
blah = dask.delayed(cfgrib.open_datasets)(file_name,backend_kwargs={'indexpath': ''},cache = False,chunks = {}) blah = client.compute(blah_2)
expected = list(XR.dataset) result: KilledWorker: Dask The log files list the following: ECCODES ERROR : grib_handle_create: cannot create handle, no definitions found ecCodes assertion failed: `h' in /home/conda/feedstock_root/build_artifacts/eccodes_1593014857650/work/src/grib_query.c:572
The files open fine when run eagerly i.e. without the Dask.delayed.
Any work arounds?
I tried some additional checks. It seems that opening the files straight into memory i.e. blah = dask.delayed(cfgrib.open_datasets)(file,backend_kwargs={'indexpath': ''}) Then it works. It seems the problem is specifically trying to open the data as a Dask.array rather than loading into memory. The parallelization doesn't seem to be the problem. Hope this extra information helps narrow it down.
I can confirm this is still here on xarray 0.16.1
and cfgrib 0.9.8.4
.
For now I'm using parallel = False
but it takes about 3 times longer than with parallel = True
.
The problem is that when opening the files for the first time with parallel = True
eccodes
throw an error to cfrgib
which is unable to write idx
files. The error which you then see in python
is due to empty idx
files.
That's interesting. Do you happen to have a theory of why this error would appear in parallel but not in serial?
On 06.10.2020 09:09, Guido Cioni wrote:
I can confirm this is still here on xarray 0.16.1 and cfgrib 0.9.8.4. For now I'm using parallel = False but it takes about 3 times longer than with parallel = True. The problem is that when opening the files for the first time with parallel = True eccodes throw an error to cfrgib which is unable to write idx files. The error which you then see in python is due to empty idx files.
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].
Links:
[1] https://github.com/ecmwf/cfgrib/issues/110#issuecomment-704077456 [2] https://github.com/notifications/unsubscribe-auth/ALQB7NIWSZYMVZKEROMGGM3SJK7AZANCNFSM4JKA7GMQ
Could it be that eccodes isn't thread safe some how? It seems that when manually open multiple files using CFgrib.open_datasets via multiple processes I don't get the error.
I do this by adding a resource spec of 1 process per task i.e. meaning that a single task will run per worker regardless of the number of threads.
Tentatively a work around?
On 06.10.2020 09:09, Guido Cioni wrote:
I can confirm this is still here on xarray 0.16.1 and cfgrib 0.9.8.4. For now I'm using parallel = False but it takes about 3 times longer than with parallel = True. The problem is that when opening the files for the first time with parallel = True eccodes throw an error to cfrgib which is unable to write idx files. The error which you then see in python is due to empty idx files.
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].
Links:
[1] https://github.com/ecmwf/cfgrib/issues/110#issuecomment-704077456 [2] https://github.com/notifications/unsubscribe-auth/ALQB7NIWSZYMVZKEROMGGM3SJK7AZANCNFSM4JKA7GMQ
The ecCodes library has to be built with thread safety enabled See https://confluence.ecmwf.int/display/UDOC/Is+ecCodes+thread-safe+-+ecCodes+FAQ
Thanks for the information. I am stumped then, can you think of another reason why I (and others) would see this behavior? m
On 08.10.2020 14:47, shahramn wrote:
The ecCodes library has to built with thread safety enabled See https://confluence.ecmwf.int/display/UDOC/Is+ecCodes+thread-safe+-+ecCodes+FAQ [1]
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [2], or unsubscribe [3].
Links:
[1] https://confluence.ecmwf.int/display/UDOC/Is+ecCodes+thread-safe+-+ecCodes+FAQ [2] https://github.com/ecmwf/cfgrib/issues/110#issuecomment-705544248 [3] https://github.com/notifications/unsubscribe-auth/ALQB7NJQTHBHRSJLT7ALJNDSJWYGVANCNFSM4JKA7GMQ
Looks like the conda recipe does NOT enable the thread safety flags. I will look into this
Awesome. Thanks for looking into it. Not all heroes wear capes :)
On 08.10.2020 15:01, shahramn wrote:
Looks like the conda recipe does NOT enable the thread safety flags. I will look into this
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [1], or unsubscribe [2].
Links:
[1] https://github.com/ecmwf/cfgrib/issues/110#issuecomment-705551762 [2] https://github.com/notifications/unsubscribe-auth/ALQB7NON65KU5AOROSXQJJDSJWZ3NANCNFSM4JKA7GMQ
Sounds like you're on the right path. A few years ago, when cfgrib
was still a baby :), I was getting an error while trying to read compressed grib files as the recipe for eccodes
on conda
was not including the compression library because of a license issue. So in the end the problem was on the eccodes
side on conda
.
I have submitted a pull-request on conda... which has now been merged
Dear Guido, Please try again and re-install ecCodes. Let me know if the issue is now fixed
Dear Guido, Please try again and re-install ecCodes. Let me know if the issue is now fixed
I've seen the update on github but cannot force an update of eccodes with the new recipe. Do I need to wait for a new version or is there a way to test this?
Can you try to update your conda eccodes to version "eccodes-2.18.0-hf05d9b7_0" ?
It still does not find it in my current channels:
(nwp-py3) g@c:~/$ conda install -c conda-forge eccodes=2.18.0=hf05d9b7_0
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
- eccodes==2.18.0=hf05d9b7_0
Current channels:
- https://conda.anaconda.org/conda-forge/osx-64
- https://conda.anaconda.org/conda-forge/noarch
- https://repo.anaconda.com/pkgs/main/osx-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/osx-64
- https://repo.anaconda.com/pkgs/r/noarch
I think you're right - it looks like it built but somehow is not available. We'll investigate
Hi @guidocioni , I'm not sure if you're on macos or Linux, but we've managed to update the conda version. Could you do the following:
conda search eccodes -c conda-forge
and if you see a 2.18.0 version with _1 at the end, install that version please. It takes the conda servers a little while to update their indexes, but it's appeared now at least on macos.
Hi @guidocioni , I'm not sure if you're on macos or Linux, but we've managed to update the conda version. Could you do the following:
conda search eccodes -c conda-forge
and if you see a 2.18.0 version with _1 at the end, install that version please. It takes the conda servers a little while to update their indexes, but it's appeared now at least on macos.
Yep it always takes a little bit of time..I will test it tomorrow and let you know. Anyway you can use one of the MWEs present in this thread with some downloaded data..I think you should be able to reproduce the error.
I can confirm this issue is resolved on eccodes 2.18.0-hc7b4307_1
!
I just tried to read 6 files with parallel=False
and parallel=True
while taking care of removing the idx
files every time and both methods worked. Before the update it used to fail with parallel=True
as described in the posts before.
Thank you all for the input :)
@alexamici I think you can close this
I can also confirm. I just ran a test using delays = [] for file in files: delays.append(dask.delay(cfgrib.opendatasets(file), backend_kwargs={"indexpath":""}))
client.persist(delays)
It previous resulted in killed workers as described. Now the issue is resolved on eccodes 2.18.0-hc7b4307_1
Thanks for reacting to this so quickly. :)
Dear Friends, need to tell ya, that I never knew that my issue with the latest updates of cfgrib belongs to the parallel=True
and unabled thread mode during installation. Would be great to see a website with some common pitfalls.
Btw.: My system ran on an old version from 2021.
Is there a way to enable multi-threading without conda? I've installed cfgrib using
pip install ecmwflibs eccodes cfgrib
With versions: ecmwflibs==0.5.6 eccodes==1.6.1 cfgrib==0.9.10.4 on Python 3.8.16 using a docker image.