netcdf4-python
netcdf4-python copied to clipboard
Segfaults when using VLEN arrays and not closing datasets on Python 3
As described here: https://github.com/Unidata/netcdf4-python/issues/218#issuecomment-43287973
The segmentation faults appear when attempting to read array values from a netCDF4.Variable with dtype=str when previous datasets were not closed.
Here is a Travis log that should be (in principle) sufficient for reproducing this.... when I have time, I will attempt to make a simpler test case: https://travis-ci.org/shoyer/xray/jobs/25466389#L120
Hi,
I also suffer the same bug, reproducible with this very simple script:
issue261.py: import netCDF4 as nc for i in xrange(1, 33): print(i) d = nc.Dataset('issue261.nc')
with issue261.nc generated this way: ncgen -b -k netCDF-4 issue261.cdl issue261.cdl: netcdf issue261 { dimensions: one = 1 ; variables: string v(one) ; }
- My configuration:
- Linux cmg-40 2.6.26-2-amd64 #1 SMP Wed Aug 19 22:33:18 UTC 2009 x86_64 GNU/Linux
- hdf5-1.8.14, netcdf-c-4.3.3.1, Python-2.7.9, Cython-0.22, numpy-1.9.2 and netcdf4-python-1.1.7rel
segfault trace on gdb: gdb python issue261.py GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... (gdb) run Starting program: python issue261.py [Thread debugging using libthread_db enabled] [New Thread 0x7f26661c26e0 (LWP 13946)] [New Thread 0x41d1f950 (LWP 13949)] [New Thread 0x42520950 (LWP 13950)] [New Thread 0x42d21950 (LWP 13951)] 1 2 3 4 5 6 7 8 9 10
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f26661c26e0 (LWP 13946)]
0x00007f26606259e4 in H5F_addr_decode () from libhdf5.so.9
Current language: auto; currently asm
(gdb) where
#0 0x00007f26606259e4 in H5F_addr_decode () from libhdf5.so.9
#1 0x00007f26607cd00c in H5T_vlen_disk_isnull () from libhdf5.so.9
#2 0x00007f26607b57ee in H5T__conv_vlen () from libhdf5.so.9
#3 0x00007f2660736849 in H5T_convert () from libhdf5.so.9
#4 0x00007f2660609a36 in H5D_get_create_plist () from libhdf5.so.9
#5 0x00007f26605f48ea in H5Dget_create_plist () from libhdf5.so.9
#6 0x00007f26617c79ff in read_var (grp=0x2598680, datasetid=83886081, obj_name=0x7fff6e1d4cd4 "v", ndims=
Using keepweakref=True when opening the Dataset eliminates the segfault for me.
import netCDF4 as nc
for i in xrange(1, 33):
print(i)
d = nc.Dataset('issue261.nc',keepweakref=True)
This suggests that the the garbage collector is not triggering the __dealloc__ Dataset method, and some internal data structures inside the HDF5 and/or netcdf library are overflowing when too many files are open. I guess there are two possible solutions:
-
figure out why the dataset is not going out of scope (where is the reference being kept?), fix that so the files do get closed.
-
file a netcdf bug report, since the segfaults should not happen when opening 33 files. This will require reproducing the segfault is a simple C program.
Of course, addressing both of these at the same time is probably a good idea.
Of course, using the python context manager will also avoids the segfault (by making sure the file is closed).
import netCDF4 as nc
for i in xrange(1, 51):
print(i)
with nc.Dataset('issue261.nc') as f:
print f
I have been unable to reproduce the problem in a simple C program (so far).
The traceback provided by @jdemaria looks similar to one discussed on the h5py list:
https://groups.google.com/forum/#!msg/h5py/3v0oBQ3SVkk/qsCwQnfTxuEJ
Hi, thanks for your quick answer! I understand from the h5py discussion that the source of the problem is not in the NetCDF C library but a thread-bug in h5py, am I wrong?
That's what it sounds like, but it happens for me even when OMP_NUM_THREADS=1. I may try recompiling hdf5 without threading enabled and see if that makes a difference.
The segfault occurs even when hdf5 is compiled with the "threadsafe" option.
Also occurs if "with nogil" wrapper around netcdf library calls is removed. So, it does not look to be a thread related issue.