netcdf4-python
netcdf4-python copied to clipboard
Creating new variable with partially filled unlimited dimension creates RuntimeError: Bad chunk sizes.
This is the minimum example I could find that reproduces the error. You can see that as soon as there is some data in the variable it is no longer possible to create new variables with a chunk size larger than the already filled dimension.
>>> import numpy as np
>>> import netCDF4 as nc4
>>> ds = nc4.Dataset("temp.nc", mode="w")
>>> ds.createDimension("test", None)
<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'test', size = 0
>>> var1 = ds.createVariable("data1", float, ("test"), chunksizes=(100,))
>>> var1
<type 'netCDF4._netCDF4.Variable'>
float64 data1(test)
unlimited dimensions: test
current shape = (0,)
filling on, default _FillValue of 9.96920996839e+36 used
>>> var2 = ds.createVariable("data2", float, ("test"), chunksizes=(100,))
>>> var2
<type 'netCDF4._netCDF4.Variable'>
float64 data2(test)
unlimited dimensions: test
current shape = (0,)
filling on, default _FillValue of 9.96920996839e+36 used
>>> var1[:] = np.arange(15)
>>> var1
<type 'netCDF4._netCDF4.Variable'>
float64 data1(test)
unlimited dimensions: test
current shape = (15,)
filling on, default _FillValue of 9.96920996839e+36 used
>>> var2
<type 'netCDF4._netCDF4.Variable'>
float64 data2(test)
unlimited dimensions: test
current shape = (15,)
filling on, default _FillValue of 9.96920996839e+36 used
>>> var3 = ds.createVariable("data3", float, ("test"), chunksizes=(100,))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "netCDF4/_netCDF4.pyx", line 2222, in netCDF4._netCDF4.Dataset.createVariable (netCDF4/_netCDF4.c:17098)
File "netCDF4/_netCDF4.pyx", line 3181, in netCDF4._netCDF4.Variable.__init__ (netCDF4/_netCDF4.c:29312)
RuntimeError: NetCDF: Bad chunk sizes.
>>>
This occurs in the following conda environment:
name: test-netcdf-4.4.4
channels:
- conda-forge
dependencies:
- curl=7.45.0=0
- hdf4=4.2.11=4
- hdf5=1.8.17=0
- jpeg=9b=0
- libnetcdf=4.4.0=1
- mkl=11.3.3=0
- netcdf4=1.2.4=np111py27_2
- numpy=1.11.0=py27_1
- openssl=1.0.2h=1
- pip=8.1.2=py27_0
- python=2.7.11=0
- readline=6.2=2
- setuptools=23.0.0=py27_0
- sqlite=3.13.0=0
- tk=8.5.18=0
- wheel=0.29.0=py27_0
- zlib=1.2.8=3
When downgrading libnetcdf to e.g. 4.3.3.1 the error goes away and everything works as expected. So I'm not sure if netcdf4-python must be adapted to support version 4.4 or if this is a bug in the C library.
Almost certainly a C lib issue. @WardF, should we create a unidata-c ticket?
Thanks. I had the same problem and conda install netcdf4=1.2.2 fixed it for now. I got it to work also when using a chunksize of 1.
AFAIK, nothing has changed in the C lib between 4.3.3.1 and 4.4 that would require a change in netcdf4-python. @jhprinz, when you downgrade to netcdf4-python 1.2.2, are you also downgrading the C lib?
Yes, I think that happens automatically. The problem was with version 12 I believe while 4.3.3.1 and 1.2.2 use version 11 ? I can't check right now, but would that help? When I upgraded using conda I had to do a forced reinstall of libnetcdf, too.
I believe the problem is in nc_def_var_extra() right here:
/* Chunksizes anyone? */
if (!ishdf4 && contiguous && *contiguous == NC_CHUNKED)
{
var->contiguous = NC_FALSE;
/* If the user provided chunksizes, check that they are not too
* big, and that their total size of chunk is less than 4 GB. */
if (chunksizes)
{
if ((retval = check_chunksizes(grp, var, chunksizes)))
return retval;
for (d = 0; d < var->ndims; d++) {
if(var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len)
return NC_EBADCHUNK;
}
/* Set the chunksizes for this variable. */
for (d = 0; d < var->ndims; d++)
var->chunksizes[d] = chunksizes[d];
}
}
The problem is: var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len
This code was written with the implicit assumption that vars would all be created before data was written to the unlimited dimension. Whoops! So in addition this code should check to see if dim[d] is NC_UNLIMITED. I don't know how to do that at the moment but I will take a look.
Update to this issue, this fix on the C side is in. See Unidata/netcdf-c#760
This has been fixed with the libnetcdf 4.6.0