netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Segfaults when using VLEN arrays and not closing datasets on Python 3

Open shoyer opened this issue 11 years ago • 9 comments

As described here: https://github.com/Unidata/netcdf4-python/issues/218#issuecomment-43287973

The segmentation faults appear when attempting to read array values from a netCDF4.Variable with dtype=str when previous datasets were not closed.

Here is a Travis log that should be (in principle) sufficient for reproducing this.... when I have time, I will attempt to make a simpler test case: https://travis-ci.org/shoyer/xray/jobs/25466389#L120

shoyer avatar May 19 '14 06:05 shoyer

Hi,

I also suffer the same bug, reproducible with this very simple script:

issue261.py: import netCDF4 as nc for i in xrange(1, 33): print(i) d = nc.Dataset('issue261.nc')

with issue261.nc generated this way: ncgen -b -k netCDF-4 issue261.cdl issue261.cdl: netcdf issue261 { dimensions: one = 1 ; variables: string v(one) ; }

  • My configuration:
    • Linux cmg-40 2.6.26-2-amd64 #1 SMP Wed Aug 19 22:33:18 UTC 2009 x86_64 GNU/Linux
    • hdf5-1.8.14, netcdf-c-4.3.3.1, Python-2.7.9, Cython-0.22, numpy-1.9.2 and netcdf4-python-1.1.7rel

segfault trace on gdb: gdb python issue261.py GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... (gdb) run Starting program: python issue261.py [Thread debugging using libthread_db enabled] [New Thread 0x7f26661c26e0 (LWP 13946)] [New Thread 0x41d1f950 (LWP 13949)] [New Thread 0x42520950 (LWP 13950)] [New Thread 0x42d21950 (LWP 13951)] 1 2 3 4 5 6 7 8 9 10

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f26661c26e0 (LWP 13946)] 0x00007f26606259e4 in H5F_addr_decode () from libhdf5.so.9 Current language: auto; currently asm (gdb) where #0 0x00007f26606259e4 in H5F_addr_decode () from libhdf5.so.9 #1 0x00007f26607cd00c in H5T_vlen_disk_isnull () from libhdf5.so.9 #2 0x00007f26607b57ee in H5T__conv_vlen () from libhdf5.so.9 #3 0x00007f2660736849 in H5T_convert () from libhdf5.so.9 #4 0x00007f2660609a36 in H5D_get_create_plist () from libhdf5.so.9 #5 0x00007f26605f48ea in H5Dget_create_plist () from libhdf5.so.9 #6 0x00007f26617c79ff in read_var (grp=0x2598680, datasetid=83886081, obj_name=0x7fff6e1d4cd4 "v", ndims=, dim=0x0) at nc4file.c:1546 #7 0x00007f26617c8e04 in nc4_rec_read_metadata_cb (grpid=, name=, info=, _op_data=) at nc4file.c:1900 #8 0x00007f2660661ec3 in H5G_iterate_cb () from libhdf5.so.9 #9 0x00007f2660663ef7 in H5G__link_iterate_table () from libhdf5.so.9 #10 0x00007f266065a1ec in H5G__compact_iterate () from libhdf5.so.9 #11 0x00007f266066b02b in H5G__obj_iterate () from libhdf5.so.9 #12 0x00007f2660663500 in H5G_iterate () from libhdf5.so.9 #13 0x00007f266069ff0b in H5Literate () from libhdf5.so.9 #14 0x00007f26617c6fb9 in nc4_rec_read_metadata (grp=0x2598680) at nc4file.c:2096 #15 0x00007f26617c765b in NC4_open (path=0x7f2655ca6144 "issue261.nc", mode=, basepe=, chunksizehintp=, use_parallel=, mpidata=, dispatch=0x7f2661a72320, nc_file=0x25a0df0) at nc4file.c:2261 #16 0x00007f2661773913 in NC_open (path=0x7f2655ca6144 "issue261.nc", cmode=4096, basepe=0, chunksizehintp=0x0, useparallel=0, mpi_info=0x0, ncidp=0x7fff6e1d584c) at dfile.c:1777 #17 0x00007f2661773bb7 in nc_open (path=0x2508930 "", mode=1847404544, ncidp=) at dfile.c:589 #18 0x00007f2664b54a24 in pyx_pw_7netCDF4_7Dataset_1__init (__pyx_v_self=0x7f2655205d60, __pyx_args=0x7f2655228fd0, __pyx_kwds=) at netCDF4.c:22619 #19 0x00007f2665c8aebe in type_call (type=, args=0x7f2655228fd0, kwds=0x0) at Objects/typeobject.c:743 #20 0x00007f2665c24ed8 in PyObject_Call (func=0x7f2664da08e0, arg=0x7f2655228fd0, kw=0x0) at Objects/abstract.c:2529 #21 0x00007f2665cd605c in PyEval_EvalFrameEx (f=0x7f266616a050, throwflag=) at Python/ceval.c:4251 #22 0x00007f2665cdc6d1 in PyEval_EvalCodeEx (co=0x7f26660e24b0, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265 #23 0x00007f2665cdc852 in PyEval_EvalCode (co=0x2508930, globals=0x7fff6e1d2800, locals=0x7fff6e1d27f8) at Python/ceval.c:667 #24 0x00007f2665cfd72a in PyRun_FileExFlags (fp=0x1ff13c0, filename=0x7fff6e1d80a3 "issue261.py", start=, globals=0x7f2666157168, locals=0x7f2666157168, closeit=1, flags=0x7fff6e1d5d60) at Python/pythonrun.c:1371 #25 0x00007f2665cfda22 in PyRun_SimpleFileExFlags (fp=0x1ff13c0, filename=0x7fff6e1d80a3 "issue261.py", closeit=1, flags=0x7fff6e1d5d60) at Python/pythonrun.c:949 #26 0x00007f2665d134ec in Py_Main (argc=1712685216, argv=0x7fff6e1d5e78) at Modules/main.c:640 #27 0x00007f2664fff1a6 in __libc_start_main () from /lib/libc.so.6 #28 0x0000000000400679 in _start () (gdb)

jdemaria avatar Jul 02 '15 09:07 jdemaria

Using keepweakref=True when opening the Dataset eliminates the segfault for me.

import netCDF4 as nc
for i in xrange(1, 33):
    print(i)
    d = nc.Dataset('issue261.nc',keepweakref=True)

This suggests that the the garbage collector is not triggering the __dealloc__ Dataset method, and some internal data structures inside the HDF5 and/or netcdf library are overflowing when too many files are open. I guess there are two possible solutions:

  1. figure out why the dataset is not going out of scope (where is the reference being kept?), fix that so the files do get closed.

  2. file a netcdf bug report, since the segfaults should not happen when opening 33 files. This will require reproducing the segfault is a simple C program.

Of course, addressing both of these at the same time is probably a good idea.

jswhit avatar Jul 02 '15 16:07 jswhit

Of course, using the python context manager will also avoids the segfault (by making sure the file is closed).

import netCDF4 as nc
for i in xrange(1, 51):
    print(i)
    with nc.Dataset('issue261.nc') as f:
        print f

I have been unable to reproduce the problem in a simple C program (so far).

jswhit avatar Jul 02 '15 18:07 jswhit

The traceback provided by @jdemaria looks similar to one discussed on the h5py list:

https://groups.google.com/forum/#!msg/h5py/3v0oBQ3SVkk/qsCwQnfTxuEJ

jswhit avatar Jul 02 '15 22:07 jswhit

Hi, thanks for your quick answer! I understand from the h5py discussion that the source of the problem is not in the NetCDF C library but a thread-bug in h5py, am I wrong?

jdemaria avatar Jul 03 '15 08:07 jdemaria

That's what it sounds like, but it happens for me even when OMP_NUM_THREADS=1. I may try recompiling hdf5 without threading enabled and see if that makes a difference.

jswhit avatar Jul 03 '15 14:07 jswhit

The segfault occurs even when hdf5 is compiled with the "threadsafe" option.

jswhit avatar Jul 03 '15 15:07 jswhit

Also occurs if "with nogil" wrapper around netcdf library calls is removed. So, it does not look to be a thread related issue.

jswhit avatar Jul 06 '15 14:07 jswhit