netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

accessing unset entries in VLEN variable causes crash

Open jhprinz opened this issue 9 years ago • 18 comments

I created a VLEN variable for numpy.float64 like this with the first (and only) dimension unlimited.

<type 'netCDF4._netCDF4.Variable'>
vlen test(mydim1)
    var_type: numpy.float64
vlen data type: float64
unlimited dimensions: mydim1
current shape = (3,)

Currently the dimension mydim has length 3 although I have only set values for entries 0 and 1 like

var = ncfile.variables['test']

var[0] = np.array([1.0, ...])
var[1] = np.array([0.1, ...])

when I try to load the content from var[3] this gives me the expected KeyError since there are only 3 and not 4 elements in the variable stored. However doing this

print var[2]

>>> Crash

will crash my ipython kernel. The same is true for slicing

print var[:]

>>> Crash

I am not sure what the intended behavior is for reading unset values in VLEN variables and if this is also a problem in the wrapped netCDF library.

jhprinz avatar Feb 11 '16 14:02 jhprinz

You definitely shouldn't get a crash. Does this simple example work for you?

from netCDF4 import Dataset
import numpy as np
nc = Dataset('test.nc','w')
vlen_type = nc.createVLType(np.float,'vltest')
nc.createDimension('x',None)
v = nc.createVariable('vl',vlen_type,'x')
v[0]=np.arange(2,dtype=np.float)
v[1]=np.arange(3,dtype=np.float)
print v[:]
print v[2]
nc.close()

You should get

[array([ 0.,  1.]) array([ 0.,  1.,  2.])]
Traceback (most recent call last):
  File "test_vlen.py", line 10, in <module>
    print v[2]
  File "_netCDF4.pyx", line 3646, in netCDF4._netCDF4.Variable.__getitem__            (netCDF4/_netCDF4.c:32443)
  File "_netCDF4.pyx", line 4398, in netCDF4._netCDF4.Variable._get (netCDF4/_netCDF4.c:40825)
 IndexError

jswhit avatar Feb 11 '16 16:02 jswhit

Yes, thanks. This works fine. I played around a little to find a simple example that fails.

This does not work. I created two variables using the same dimension. First the fixed length was used to increase the current length to 3 and then add the VLEN variable. I am using the conda to install netcdf. Current version is

netcdf4                   1.2.2               np110py27_0    defaults

netcdf

from netCDF4 import Dataset
import numpy as np
nc = Dataset('test.nc','w')
vlen_type = nc.createVLType(np.float64,'vltest')
nc.createDimension('x', None)
v = nc.createVariable('vl', vlen_type, 'x')
w = nc.createVariable('vl2', np.float64, 'x')
w[0:3] = np.arange(3,dtype=np.float64)
v[0]=np.arange(200000,dtype=np.float64)
v[1]=np.arange(3000000,dtype=np.float64)
print v[2]
print v[:]
nc.close()

jhprinz avatar Feb 11 '16 20:02 jhprinz

Confirmed. Seems like a C library issue though. If you comment out the print statements in your example, it does not crash. However, running ncdump on the resulting file does segfault. Seems like what we need is a C version of this code that triggers the segfault. Once we have that, a netcdf-c issue can be opened.

jswhit avatar Feb 12 '16 00:02 jswhit

Perhaps it would be sufficient to attach the netcdf file generated by the example code above to a netcdf-c issue.

jswhit avatar Feb 12 '16 00:02 jswhit

cc: @WardF

dopplershift avatar Feb 12 '16 00:02 dopplershift

Thanks, @dopplershift. This had slipped beneath my ra.. my notice. The file that causes the crash would be sufficient for tracking down the issue and I can probably craft a C program if need be.

WardF avatar Feb 12 '16 03:02 WardF

Here's the netcdf file:

test.nc.gz

jswhit avatar Feb 12 '16 03:02 jswhit

As you can see in the timeline above, I've opened an issue to track this on the netcdf-c end. I'll report back in here when the issue is fixed, or you can track progress over at https://github.com/Unidata/netcdf-c/issues/221.

WardF avatar Feb 12 '16 17:02 WardF

Ok, there is/was a logic flaw when using an NC_UNLIMITED dimension with a VLEN data type. More testing is needed to ensure I haven't added a different bug, but I'm somewhat optimistic at this point (the other tests are passing). I need to:

  • [ ] Wire new tests into cmake and autotools.
  • [ ] Expand the test beyond the basic case.

I won't jinx things by forecasting a timeframe for the fix, but I want to close this ASAP, this week if at all possible.

WardF avatar Feb 18 '16 22:02 WardF

The fix for the related netcdf-c issue has been merged into netcdf-c:master; see https://github.com/Unidata/netcdf-c/pull/224 for details. The issue was in reading the file, not writing it, so the same test file can be used with the new version. I've closed the related netcdf-c issue, but will reopen if need be; It's plausible I've missed an edge case. Thanks again for the letting me know about the issue!

WardF avatar Feb 19 '16 20:02 WardF

It's been a while, but thank you guys for taking care of this!

jhprinz avatar Jun 30 '16 10:06 jhprinz

@jhprinz Does that mean this issue is fixed for you now?

dopplershift avatar Jun 30 '16 16:06 dopplershift

@dopplershift A quick update: I just rechecked: With conda install netcdf4=1.2.2 and libnetcdf=4.3.3.1 is fails already at the first print statement print v[2].

After the update to conda install netcdf4=1.2.4 and libnetcdf=4.4.1 it still fails but only at the second print statement print v[:]

Currently I do not need this feature so I consider this closed from my side, but I guess this is still not the desire behaviour. I am happy to help if I can to sort this out though.

I currently still use 1.2.2 because of some chunking related problems, but there is already an issue on that one.

jhprinz avatar Nov 13 '16 20:11 jhprinz

Ok, now I'm running into this, though slightly differently:

from netCDF4 import Dataset
import numpy as np
nc = Dataset('test.nc', 'w')
vlen_type = nc.createVLType(np.float64, 'vltest')
nc.createDimension('x', None)
v = nc.createVariable('vl', vlen_type, 'x')
w = nc.createVariable('vl2', np.float64, 'x')
w[0:3] = np.arange(3, dtype=np.float64)
print(v[0])  # prints '[]' or sometimes crashes
print(v[0].tolist())  # prints '[]' or sometimes crashes
print(v[0].size)  # BOOM!
nc.close()

At the terminal:

python(5287,0x7fffa21c83c0) malloc: *** error for object 0xf000000010dc9c88: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

Here's what I'm running:

libnetcdf                 4.4.1                         0    conda-forge
netcdf4                   1.2.4               np111py35_2    conda-forge

I'm open to suggestions here. My use case is to be able to append to a vlen variable, so I kind of need to be able to access what's there already. 😀

dopplershift avatar Nov 16 '16 00:11 dopplershift

Seems like there may be some lingering issues in the netcdf-c vlen code. Would be nice to have a c-program that triggers the crash.

jswhit avatar Nov 16 '16 02:11 jswhit

Just a reminder that via netcdf-c, one cannot modify the length of a vlen without rewriting the top-level variable.

DennisHeimbigner avatar Nov 16 '16 02:11 DennisHeimbigner

@dopplershift Please see the pull request #605 for a possible fix. (I noticed that vldata allocated by Variable._get(...) may still be uninitialized when it's passed to nc_free_vlens(), which may result in calling free with a random argument.)

ckhroulev avatar Nov 16 '16 09:11 ckhroulev

Pull request #605 fixes @dopplershift's test script for me. Thanks @ckhroulev!

jswhit avatar Nov 16 '16 16:11 jswhit