netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Creating new variable with partially filled unlimited dimension creates RuntimeError: Bad chunk sizes.

Open cpaulik opened this issue 9 years ago • 7 comments

This is the minimum example I could find that reproduces the error. You can see that as soon as there is some data in the variable it is no longer possible to create new variables with a chunk size larger than the already filled dimension.

>>> import numpy as np
>>> import netCDF4 as nc4
>>> ds = nc4.Dataset("temp.nc", mode="w")
>>> ds.createDimension("test", None)
<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'test', size = 0
>>> var1 = ds.createVariable("data1", float, ("test"), chunksizes=(100,))
>>> var1
<type 'netCDF4._netCDF4.Variable'>
float64 data1(test)
unlimited dimensions: test
current shape = (0,)
filling on, default _FillValue of 9.96920996839e+36 used

>>> var2 = ds.createVariable("data2", float, ("test"), chunksizes=(100,))
>>> var2
<type 'netCDF4._netCDF4.Variable'>
float64 data2(test)
unlimited dimensions: test
current shape = (0,)
filling on, default _FillValue of 9.96920996839e+36 used

>>> var1[:] = np.arange(15)
>>> var1
<type 'netCDF4._netCDF4.Variable'>
float64 data1(test)
unlimited dimensions: test
current shape = (15,)
filling on, default _FillValue of 9.96920996839e+36 used

>>> var2
<type 'netCDF4._netCDF4.Variable'>
float64 data2(test)
unlimited dimensions: test
current shape = (15,)
filling on, default _FillValue of 9.96920996839e+36 used

>>> var3 = ds.createVariable("data3", float, ("test"), chunksizes=(100,))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "netCDF4/_netCDF4.pyx", line 2222, in netCDF4._netCDF4.Dataset.createVariable (netCDF4/_netCDF4.c:17098)
  File "netCDF4/_netCDF4.pyx", line 3181, in netCDF4._netCDF4.Variable.__init__ (netCDF4/_netCDF4.c:29312)
RuntimeError: NetCDF: Bad chunk sizes.
>>> 

This occurs in the following conda environment:

name: test-netcdf-4.4.4
channels:
- conda-forge
dependencies:
- curl=7.45.0=0
- hdf4=4.2.11=4
- hdf5=1.8.17=0
- jpeg=9b=0
- libnetcdf=4.4.0=1
- mkl=11.3.3=0
- netcdf4=1.2.4=np111py27_2
- numpy=1.11.0=py27_1
- openssl=1.0.2h=1
- pip=8.1.2=py27_0
- python=2.7.11=0
- readline=6.2=2
- setuptools=23.0.0=py27_0
- sqlite=3.13.0=0
- tk=8.5.18=0
- wheel=0.29.0=py27_0
- zlib=1.2.8=3

When downgrading libnetcdf to e.g. 4.3.3.1 the error goes away and everything works as expected. So I'm not sure if netcdf4-python must be adapted to support version 4.4 or if this is a bug in the C library.

cpaulik avatar Jun 21 '16 16:06 cpaulik

Almost certainly a C lib issue. @WardF, should we create a unidata-c ticket?

jswhit avatar Jun 22 '16 00:06 jswhit

Thanks. I had the same problem and conda install netcdf4=1.2.2 fixed it for now. I got it to work also when using a chunksize of 1.

jhprinz avatar Nov 13 '16 19:11 jhprinz

AFAIK, nothing has changed in the C lib between 4.3.3.1 and 4.4 that would require a change in netcdf4-python. @jhprinz, when you downgrade to netcdf4-python 1.2.2, are you also downgrading the C lib?

jswhit avatar Nov 16 '16 17:11 jswhit

Yes, I think that happens automatically. The problem was with version 12 I believe while 4.3.3.1 and 1.2.2 use version 11 ? I can't check right now, but would that help? When I upgraded using conda I had to do a forced reinstall of libnetcdf, too.

jhprinz avatar Nov 16 '16 17:11 jhprinz

I believe the problem is in nc_def_var_extra() right here:

   /* Chunksizes anyone? */
   if (!ishdf4 && contiguous && *contiguous == NC_CHUNKED)
   {
      var->contiguous = NC_FALSE;

      /* If the user provided chunksizes, check that they are not too
       * big, and that their total size of chunk is less than 4 GB. */
      if (chunksizes)
      {

	 if ((retval = check_chunksizes(grp, var, chunksizes)))
	    return retval;
	 for (d = 0; d < var->ndims; d++) {
	    if(var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len)
	       return NC_EBADCHUNK;
	 }

	 /* Set the chunksizes for this variable. */
	 for (d = 0; d < var->ndims; d++)
	    var->chunksizes[d] = chunksizes[d];
      }
   }

The problem is: var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len

This code was written with the implicit assumption that vars would all be created before data was written to the unlimited dimension. Whoops! So in addition this code should check to see if dim[d] is NC_UNLIMITED. I don't know how to do that at the moment but I will take a look.

edhartnett avatar Nov 13 '17 00:11 edhartnett

Update to this issue, this fix on the C side is in. See Unidata/netcdf-c#760

Lnaden avatar Jan 25 '18 19:01 Lnaden

This has been fixed with the libnetcdf 4.6.0

Lnaden avatar Mar 12 '18 12:03 Lnaden