netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

making my own netcdf4 wheel results in a broken package

Open bluppfisk opened this issue 2 years ago • 14 comments

There is a problem with the pypi package (mentioned elsewhere, basically it spams megabytes worth of warnings). This error is circumvented by downloading the source and using pip install from inside the source folder.

However, to make things easier in my organisation, I want to build a for our target system(s) that can be installed from our internal repository without going through the hoops described above. However, the resulting package seems broken, as the netCDF4 module does not have the Dataset attribute. This is evidenced by the error:

ImportError: cannot import name 'Dataset' from 'netCDF4' (unknown location).

System:

  • netCDF4==1.6.2 and netCDF4==1.6.3
  • Debian Linux Bullseye
  • steps to reproduce: (NOTE: on some Debian systems, version 1.6.2 will faill to compile from source because there are comments in the header files. This was patched in https://github.com/Unidata/netcdf4-python/pull/1219/files and available in 1.6.3)
wget https://github.com/Unidata/netcdf4-python/archive/refs/tags/v1.6.2.tar.gz
tar -xvzf v1.6.2.tar.gz
cd netcdf4-python-1.6.2
python3 -m venv ./env && source env/bin/activate
pip install wheel
python3 setup.py bdist_wheel   # this calls build, install as well
python3 -m venv ./env && source env/bin/activate
pip3 install dist/netCDF4-1.6.2-cp310-cp310-linux_x86_64.whl
python3 -c "from netCDF4 import Dataset"  # I get an error

deactivate && rm -rf ./env
python3 -m venv ./env && source env/bin/activate
pip install .
python3 -c "from netCDF4 import Dataset"  #no error

What is it that pip install does differently from python3 -m setup.py (followed by a pip install of the wheel)?

Edit: came here after SO didn't yield any immediate answers (https://stackoverflow.com/questions/75874745/compiling-and-installing-with-pip-install-vs-python3-m-build)

bluppfisk avatar Mar 29 '23 14:03 bluppfisk

What is it that pip install does differently from python3 -m setup.py (followed by a pip install of the wheel)?

Modern pip follow pep517/518 and will create a build environment specified in the pyproject.toml. One should not me calling setup.py directly like that anymore. However, if you want to build your own wheel I recommend to use the build:

python3 -m pip install build
python3 -m build --wheel . --outdir dist

Then install that wheel.

ocefpaf avatar Mar 29 '23 15:03 ocefpaf

Thank you for answering!

This indeed works to some extent. It builds a wheel that does not cause the above problem. However, there still seem to be missing some things that are included in the pypi package, notably libnetcdf.so.18, as evidenced when trying to use the package:

>   from ._netCDF4 import *
E   ImportError: libnetcdf.so.18: cannot open shared object file: No such file or directory

.tox/py39/lib/python3.9/site-packages/netCDF4/__init__.py:3: ImportError

If I use the package on a system that has the packages libnetcdf-dev and libhdf5-dev installed, it works fine. But the pypi package does not require a system with those packages installed.

Note that the system that builds the custom netcdf wheel does have those packages installed as they are required for building - but they should not be required for running.

Both systems run the same debian version (docker images).

bluppfisk avatar Mar 29 '23 18:03 bluppfisk

The PyPI ones have those built and bundled for you. That is why we recommend using the built wheels!

If you want to build your own wheel you will need the c-libs installed to compiled netcdf4. See https://github.com/Unidata/netcdf4-python#development-installation

ocefpaf avatar Mar 29 '23 18:03 ocefpaf

Can you be more specific about the problem you mentioned ("spams megabytes worth of warnings")? We can hopefully fix that in the pre-built wheels.

jswhit avatar Mar 29 '23 21:03 jswhit

I would like to use the PyPI packages, but they spam so many warnings that they're technically unusable. I dug out a sample for @jswhit (see below).

By chance I found out that installing the source (from an unzipped release) did not exhibit this behaviour. In order to bring this improved experience to our team (but without having them install packages and compile code), I'd like to build the wheel and include the libs as well. I see now that the PyPI wheels have the libraries included. Would you be able to point me in the right direction as to how to do this for my own wheels?

Sample HDF5 errors when using the PyPI 1.6.2/3 packages (basically, this is repeated thousands of times) even for a very simple read action on an HDF file. This does not impact functionality, it just bloats the logs.

HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitRoundNumberOfSignificantBits'
    major: Attribute
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits'
    major: Attribute
    minor: Object not found

bluppfisk avatar Mar 29 '23 21:03 bluppfisk

@bluppfisk what modules were imported by the script that produced these warnings? I suspect a binary incompatibility between HDF5 versions included in netCDF4 and some other package (perhaps xarray or h5py?).

jswhit avatar Mar 30 '23 13:03 jswhit

@jswhit there's definitely xarray which uses netcdf4 as a backend. I also believe this problem occurs mostly in a multi-processing environment but I am not 100% sure.

bluppfisk avatar Mar 30 '23 13:03 bluppfisk

Only to ask whether you discovered https://github.com/pypa/auditwheel for your wheels. This copies the linked libraries into the wheel, so you can install it on other compatible systems, along with the libraries.

matthew-brett avatar Mar 30 '23 13:03 matthew-brett

the multiprocessing bit may be the culprit - the netCDF4 and HDF5 libs are not thread-safe.

jswhit avatar Mar 30 '23 13:03 jswhit

Only to ask whether you discovered https://github.com/pypa/auditwheel for your wheels. This copies the linked libraries into the wheel, so you can install it on other compatible systems, along with the libraries.

Thanks, this is indeed something I've found while searching for how to include libraries. But I'm not sure whether/how it will automatically decide which libraries it needs to include or whether/how to compile them.

I'm digging a bit through the travis files in this repository, but fail to understand. Too inexperienced, so appreciate any pointers.

@jswhit : it may very well be, but the beauty is that the versions installed from the source (i.e. not from pypi) do not exhibit this behaviour, so I was trying to recreate this experience (but distributed).

bluppfisk avatar Mar 30 '23 14:03 bluppfisk

agreed - it's not likely a threading issue then. How was the xarray package installed? via conda, pip, or a locally built wheel? I'm guessing it's using a different version of the HDF5 library. Simply changing the order of the imports may make it go away if this is the issue. If you can post a simple script that reproduces the warnings on your system, along with information about what versions of xarray and netcdf4-python were used and how they were installed, that would be a big help. (also what platform - windows, macosx or linux?)

jswhit avatar Mar 30 '23 16:03 jswhit

I'll do so in the coming days (takes some time to extract something useful).

Answers to your questions:

  • xarray and netCDF4 are installed via pip (if I use locally built wheels, the errors disappear; same goes for the conda version)
  • versions:
xarray==2022.12.0
netCDF4==1.6.2 (but .3 exhibits the same behaviour)
  • OS: Linux (Debian bullseye)
  • netCDF4 isn't directly imported anywhere in my program; xarray does, somewhere down the line.

In the meantime, could you provide any further pointers on how to compile + include those libraries in a wheel? I tried running the ./configure script but it complains about having nothing to compile. I don't know how that github workspace is set up.

bluppfisk avatar Mar 30 '23 23:03 bluppfisk

@bluppfisk are xarray and netCDF4 the only packages in your env? xarray is a pure python package, no c-lib is pulled unless you request some specific backend, like hdf5 via h5py/h5netcdf, or netcdf via netcdf4. That means the only c-libs you should have are the ones bundled in the netCDF wheel. Those are OK and work fine, there should be no conflicts. Here is an example of an env with them:

pip install xarray netcdf4 pooch
Collecting xarray
  Downloading xarray-2023.3.0-py3-none-any.whl (981 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 981.2/981.2 kB 6.2 MB/s eta 0:00:00
Collecting netcdf4
  Downloading netCDF4-1.6.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/5.2 MB 3.7 MB/s eta 0:00:00
Collecting numpy>=1.21
  Downloading numpy-1.24.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 13.8 MB/s eta 0:00:00
Collecting pandas<2,>=1.4
  Downloading pandas-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.0/12.0 MB 16.3 MB/s eta 0:00:00
Collecting packaging>=21.3
  Using cached packaging-23.0-py3-none-any.whl (42 kB)
Collecting cftime
  Downloading cftime-1.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 20.2 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.1
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 kB 22.0 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.3/502.3 kB 16.3 MB/s eta 0:00:00
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, six, packaging, numpy, python-dateutil, cftime, pandas, netcdf4, xarray
Successfully installed cftime-1.6.2 netcdf4-1.6.3 numpy-1.24.2 packaging-23.0 pandas-1.5.3 python-dateutil-2.8.2 pytz-2023.3 six-1.16.0 xarray-2023.3.0

and then, inside the Python interpreter:

>>> import xarray
>>> ds = xarray.tutorial.open_dataset("air_temperature", engine="netcdf4")
>>> ds
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly

As you can see I don't get a huge log output. This is also on Linux with an env that is isolated from any system libraries and using only the wheels provided on PyPI.

If you can provide means for us to reproduce your env maybe we can try to debug this further but at this point I suspect you mat have more packages installed and/or some conflicts in there.

PS: pooch is also pure Python and only required to fetch the tutorial datasets. You don't need it when testing with your local files and it does not interfere in the wrapped c-libs.

ocefpaf avatar Mar 31 '23 12:03 ocefpaf

See #1242 for a way to reproduce this issue. I suggest the discuss is move to that issue to avoid mixing the problem of building your own wheel with the apparently new verbosity of HDF5.

ocefpaf avatar Mar 31 '23 15:03 ocefpaf