netcdf4-python
netcdf4-python copied to clipboard
Core dump when opening OPeNDAP dataset after first creating local file
I encounter the following problem on Ubuntu 18.04 with conda and python 3.6-3.8 (not with Python2). libnetcdf version is 4.7.1, and is locked since I need gdal simultaneously.
The following lines:
from netCDF4 import Dataset
d1 = Dataset('tmp.nc', 'w')
d1.close()
d2 = Dataset('http://thredds.met.no/thredds/dodsC/meps25epsarchive/2019/11/26/meps_mbr0_extracted_2_5km_20191126T00Z.nc')
print(d2.variables['time'])
give a core dump:
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
long_name: time
standard_name: time
units: seconds since 1970-01-01 00:00:00 +00:00
_ChunkSizes: 1
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
Segmentation fault (core dumped)
If I omit the first generation of the local file (d1), it returns as expected:
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
long_name: time
standard_name: time
units: seconds since 1970-01-01 00:00:00 +00:00
_ChunkSizes: 1
unlimited dimensions: time
current shape = (67,)
filling off
This is the same for two different OPeNDAP datasets with time as an unlimited dimension. For another dataset with time as fixed size, there is no core dump.
Thus the problem can be formulated as:
- a core dump results when opening an OPeNDAP dataset with unlimited time dimension after first opening/closing a local file.
Here is the back trace:
$ gdb --args python fail2.py
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) r
Starting program: /home/knutfd/miniconda2/envs/opendrift_p36/bin/python fail2.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3853700 (LWP 11144)]
[New Thread 0x7ffff3052700 (LWP 11145)]
[New Thread 0x7ffff0851700 (LWP 11146)]
[New Thread 0x7fffec050700 (LWP 11147)]
[New Thread 0x7fffe984f700 (LWP 11148)]
[New Thread 0x7fffe704e700 (LWP 11149)]
[New Thread 0x7fffe484d700 (LWP 11150)]
[New Thread 0x7fffe14bf700 (LWP 11151)]
[Thread 0x7fffe14bf700 (LWP 11151) exited]
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
long_name: time
standard_name: time
units: seconds since 1970-01-01 00:00:00 +00:00
_ChunkSizes: 1
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff65d93c9 in nclistfree ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
(gdb) bt
#0 0x00007ffff65d93c9 in nclistfree ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#1 0x00007ffff661884c in nc4_nc4f_list_del ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#2 0x00007ffff661eb1b in NC4_abort ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#3 0x00007ffff65d0274 in nc_abort ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#4 0x00007ffff664288c in NCD2_close ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#5 0x00007ffff65d02e6 in nc_close ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/../../../libnetcdf.so.15
#6 0x00007ffff679d573 in __pyx_pw_7netCDF4_8_netCDF4_7Dataset_15_close ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/_netCDF4.cpython-36m-x86_64-linux-gnu.so
#7 0x00007ffff679ea4d in __pyx_tp_dealloc_7netCDF4_8_netCDF4_Dataset ()
from /home/knutfd/miniconda2/envs/opendrift_p36/lib/python3.6/site-packages/netCDF4/_netCDF4.cpython-36m-x86_64-linux-gnu.so
#8 0x00005555556a0b2d in delete_garbage (old=<optimized out>, collectable=<optimized out>)
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/gcmodule.c:865
#9 collect ()
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/gcmodule.c:1016
#10 0x0000555555740a1a in _PyGC_CollectNoFail ()
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/gcmodule.c:1626
#11 0x00005555557004f8 in PyImport_Cleanup ()
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Python/import.c:431
#12 0x0000555555765c91 in Py_FinalizeEx ()
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Python/pylifecycle.c:608
#13 0x0000555555770f6c in Py_Main ()
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Modules/main.c:830
#14 0x0000555555638cde in main ()
at /home/conda/feedstock_root/build_artifacts/python_1573054930886/work/Programs/python.c:69
#15 0x00007ffff77e6b97 in __libc_start_main (main=0x555555638bf0 <main>, argc=2, argv=0x7fffffffda48,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffda38)
at ../csu/libc-start.c:310
#16 0x0000555555721242 in _start () at ../sysdeps/x86_64/elf/start.S:103
We found that the problem is avoided with python-netcdf4=1.5.1.2 and libnetcdf=4.6.2
Can't reproduce on my macos machine with libnetcdf 4.6.2, 4.7.0 or 4.7.2. Can't see how the netcdf4-python version would matter since it's a segfault in the netcdf-c library.
I forgot to say that conda-forge was used (as libnetcdf4=4.7.1 is not available in anaconda channel). The problem should be reproducible by:
conda create -n fail -c conda-forge libnetcdf=4.7.1 netcdf4=1.5.3
conda activate fail
and then these lines:
from netCDF4 import Dataset
d1 = Dataset('tmp.nc', 'w')
d1.close()
d2 = Dataset('http://thredds.met.no/thredds/dodsC/meps25epsarchive/2019/11/26/meps_mbr0_extracted_2_5km_20191126T00Z.nc')
print(d2.variables['time'])
do not produce a core dump this time, however, time is an empty variable, instead of length 67 which is correct and obtained within this environment:
conda create -n nofail -c conda-forge libnetcdf=4.6.2 netcdf4=1.5.1
For those trying to run the OP's test script, you have to change the date in the filename - older files age off.
Confirmed on macos x with Ananconda. Must be specific to the Anaconda packages though - if I build netcdf 4.7.1 and netcdf4-python 1.5.3 myself it works fine.
From the traceback, it looks perhaps like the netcdf-c lib is trying to access memory that has already been deallocated by the python garbage collector.
Error also disappears if I upgrade to conda-forge libnetcdf 4.7.3 (but this requires building netcdf4-python from source since the conda-forge package is pinned on 4.7.1 for some reason).
For binary compatibility, all packages in conda-forge are pinned to an exact version of libnetcdf, and they migrate in lock-step. Might be able to request a bump here: https://github.com/conda-forge/conda-forge-pinning-feedstock/issues
It seems that this is no longer an issue with:
netcdf4 1.5.3 nompi_py38hd35fb8e_102 conda-forge