xarray icon indicating copy to clipboard operation
xarray copied to clipboard

import xarray causes fatal python crash on windows when h5netcdf and netcdf4 are installed

Open daanscheltens opened this issue 1 year ago • 3 comments

What happened?

On Windows with python (3.9 and 3.10) the command import xarray results in a crash of python, if I have the packages netcdf4 and h5netcdf installed.

What did you expect to happen?

I expected that xarray would import normally, without a fatal python error.

Minimal Complete Verifiable Example

# On windows:
pip install xarray
pip install h5netcdf
pip install netcdf4

# This results in a crash
python -c "import xarray"

# The crash does not occur when I first import h5netcdf and then import xarray, so the next line does not result in a crash:
python -c "import h5netcdf;import xarray"

# The crash does not occur on linux.
# The crash does not occur when I have only h5netcdf or netcdf4 installed.

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

command: python -c "import xarray"
C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\h5py\__init__.py:36: UserWarning: h5py is running against HDF5 1.12.1 when it was built against 1.12.2, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "
Warning! ***HDF5 library version mismatched error***
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.12.2, library is 1.12.1
        SUMMARY OF THE HDF5 CONFIGURATION
        =================================

General Information:
-------------------
                   HDF5 Version: 1.12.1
                  Configured on: 2022-03-04
                  Configured by: Ninja
                    Host system: Windows-10.0.17763
              Uname information: Windows
                       Byte sex: little-endian
             Installation point: D:/bld/hdf5_split_1646412547396/_h_env/Library

Compiling Options:
------------------
                     Build Mode: RELEASE
              Debugging Symbols: OFF
                        Asserts: OFF
                      Profiling: OFF
             Optimization Level: OFF

Linking Options:
----------------
                      Libraries: 
  Statically Linked Executables: OFF
                        LDFLAGS: /machine:x64
                     H5_LDFLAGS: 
                     AM_LDFLAGS: 
                Extra libraries: D:/bld/hdf5_split_1646412547396/_h_env/Library/lib/libcurl.lib;D:/bld/hdf5_split_1646412547396/_h_env/Library/lib/libssl.lib;D:/bld/hdf5_split_1646412547396/_h_env/Library/lib/libcrypto.lib
                       Archiver: C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.16.27023/bin/HostX64/x64/lib.exe
                         Ranlib: :

Languages:
----------
                              C: YES
                     C Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.16.27023/bin/HostX64/x64/cl.exe 19.16.27045.0
                       CPPFLAGS: 
                    H5_CPPFLAGS: 
                    AM_CPPFLAGS: 
                         CFLAGS:   /DWIN32 /D_WINDOWS 
                      H5_CFLAGS: /W3;/wd4100;/wd4706;/wd4127
                      AM_CFLAGS: 
               Shared C Library: YES
               Static C Library: YES

                        Fortran: OFF
               Fortran Compiler:  
                  Fortran Flags: 
               H5 Fortran Flags: 
               AM Fortran Flags: 
         Shared Fortran Library: YES
         Static Fortran Library: YES

                            C++: ON
                   C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.16.27023/bin/HostX64/x64/cl.exe 19.16.27045.0
                      C++ Flags:   
                   H5 C++ Flags: /W3;/wd4100;/wd4706;/wd4127
                   AM C++ Flags: 
             Shared C++ Library: YES
             Static C++ Library: YES

                            JAVA: OFF
                 JAVA Compiler:  

Features:
---------
                   Parallel HDF5: OFF
Parallel Filtered Dataset Writes: 
              Large Parallel I/O: 
              High-level library: ON
                Build HDF5 Tests: ON
                Build HDF5 Tools: ON
                    Threadsafety: ON (recursive RW locks: ) 
             Default API mapping: v112
  With deprecated public symbols: ON
          I/O filters (external):  DEFLATE
                             MPE: 
                      Direct VFD: 
                      Mirror VFD: 
              (Read-Only) S3 VFD: 1
            (Read-Only) HDFS VFD: 
                         dmalloc: 
  Packages w/ extra debug output: 
                     API Tracing: OFF
            Using memory checker: OFF
 Memory allocation sanity checks: OFF
          Function Stack Tracing: OFF
                Use file locking: best-effort
       Strict File Format Checks: OFF
    Optimization Instrumentation: 
Bye...
Error: Process completed with exit code 1.

Anything else we need to know?

This bug is reproduced by the github action runner: https://github.com/daanscheltens/test-netcdf4/actions/runs/3196339371/jobs/5218135577

This action is part of a dedicated empty repository that just contains this action workflow: https://github.com/daanscheltens/test-netcdf4/blob/main/.github/workflows/action.yml

Environment

python -c "import h5netcdf; import xarray as xr;xr.show_versions()"

INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 106 Stepping 6, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United States', '1252') libhdf5: 1.12.2 libnetcdf: None

xarray: 2022.9.0 pandas: 1.5.0 numpy: 1.23.3 scipy: None netCDF4: None pydap: None h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 58.1.0 pip: 22.2.2 conda: None pytest: None IPython: None sphinx: None

daanscheltens avatar Oct 06 '22 10:10 daanscheltens

I suspect that's because you're installing via pip, where both the h5netcdf and the netcdf4 wheel bundle the HDF5 library (but different versions, apparently). If that's correct, you should get the same error if you're importing netcdf4 then h5netcdf:

import netCDF4
import h5netcdf

Note that because there is a separate libhdf5 package for conda you don't have that issue (and you might want to use micromamba in CI)

keewis avatar Oct 06 '22 11:10 keewis

Thanks for the quick reply.

I indeed get the same error when trying import netcdf4 import h5netcdf

Since the h5py libary is used by h5netcdf, I also tried the next imports, which again give the same error. import netcdf4 import h5py

Conclusion is that indead the hdf5 library inside netcdf4 is older, and the check in h5py gives the fatal error.

Both netCDF4 and h5netcdf are optional requirements for xarray. Why is netCDF4 then imported when I don't use it for a certain calculation?

Note that conda is not an option for me, as this is incompatible with nessecary third party software.

daanscheltens avatar Oct 06 '22 12:10 daanscheltens

See https://github.com/pydata/xarray/issues/6726#issuecomment-1257279640.

We are thinking in doing the backend imports only when needed.

headtr1ck avatar Oct 06 '22 12:10 headtr1ck