netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

HDF error on reading back NC_VLEN variable with fill value and chunking

Open krisfed opened this issue 3 years ago • 13 comments

We are using netcdf-c 4.8.1 and seeing an HDF error when using nc_get_var on an NC_VLEN variable that has (1) a fill value set, (2) not all elements filled, and (3) some chunking applied.

Not sure if this is expected behavior and the applied chunking or some other part of the process is incorrect (but then shouldn't it error out on writing, not reading?). Or does this look like a bug?

Here is some simplistic reproduction code. Here I have an NC_VLEN (of NC_DOUBLEs) variable with one dimension of size 4, and I am only writing the first 2 elements of it. There is fill value (set as {0, 101}) and chunking (set to 1).

#include <iostream>
#include "netcdf.h"

void checkErrorCode(int status, const char* message){
    if (status != NC_NOERR){
        std::cout << "Error code: " << status << " from " << message << std::endl;
        std::cout << nc_strerror(status) << std::endl << std::endl;
    }
}

int main(int argc, const char * argv[]) {
    
    // ================ WRITE ==================
    
    // Setup data
    const size_t DATA_LENGTH = 4;
    nc_vlen_t data[DATA_LENGTH];
    
    const int first_size = 2;
    double first[first_size] = {2, 5};
    data[0].p = first;
    data[0].len = first_size;
    
    const int second_size = 3;
    double second[second_size] = {88, 96, 42};
    data[1].p = second;
    data[1].len = second_size;

    // Open file
    int ncid;
    int retval;
    
    retval = nc_create("vlenFillValue.nc", NC_NETCDF4, &ncid);
    checkErrorCode(retval, "nc_create");
    
    // Define vlen type named RAGGED_DOUBLE
    nc_type vlen_typeID;
    retval = nc_def_vlen(ncid, "RAGGED_DOUBLE", NC_DOUBLE, &vlen_typeID);
    checkErrorCode(retval, "nc_def_vlen");
    
    // Define dimension
    int dimid;
    retval = nc_def_dim(ncid, "xdim", DATA_LENGTH, &dimid);
    checkErrorCode(retval, "nc_def_dim");
    
    // Define vlen variable
    int varid;
    retval = nc_def_var(ncid, "var", vlen_typeID, 1, &dimid, &varid);
    checkErrorCode(retval, "nc_def_var");
    
    // Define chunking
    const size_t chunk = 1; //error also with 3
    retval = nc_def_var_chunking(ncid, varid, NC_CHUNKED, &chunk);
    checkErrorCode(retval, "nc_def_var_chunking");
    
    // Define fill value
    nc_vlen_t fillValue;
    double fv[2] = {0, 101};
    fillValue.p = fv;
    fillValue.len = 2;
    retval = nc_def_var_fill(ncid, varid, NC_FILL, &fillValue);
    checkErrorCode(retval, "nc_def_var_fill");
    
    // Write vlen variable
    size_t start = 0;
    size_t count = 2;
    retval = nc_put_vara(ncid, varid, &start, &count, data);
    checkErrorCode(retval, "nc_put_vara");
    
    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close (1)");
    
    
    // ================ READ ==================
    
    // open file
    retval = nc_open("vlenFillValue.nc", NC_NOWRITE, &ncid);
    checkErrorCode(retval, "nc_open");
    
    nc_vlen_t* data_read = new nc_vlen_t[DATA_LENGTH];
    
    retval = nc_get_var(ncid, varid, data_read);
    checkErrorCode(retval, "nc_get_var");
    
    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close (2)");
    
    return retval;
}

Here is the output (this was run on macOS 11.2.3, but we see the issue on other OS's too):

$ ./a.out 
Error code: -101 from nc_get_var
NetCDF: HDF error

I see that ncdump also errors out on the produced file:

$ ncdump vlenFillValue.nc 
netcdf vlenFillValue {
types:
  double(*) RAGGED_DOUBLE ;
dimensions:
	xdim = 4 ;
variables:
	RAGGED_DOUBLE var(xdim) ;
		RAGGED_DOUBLE var:_FillValue = {0, 101} ;
data:

NetCDF: HDF error

krisfed avatar Feb 01 '22 23:02 krisfed

You might look at the conversation associated with this: https://github.com/Unidata/netcdf-c/pull/2179 In particular, your example looks like the known bug #1.

DennisHeimbigner avatar Feb 02 '22 02:02 DennisHeimbigner

Thank you Dennis! It does look potentially related to the mentioned known bug about NC_VLEN and fill values in #2179, although I am not sure what kind of failures were observed there (errors/crash, on reading/writing, etc). Another issue possibly related to the same NC_VLEN/FillValue problem is how the crash in https://github.com/Unidata/netcdf-c/issues/2181# persists even when the data was "zeroed out" if a fill value was used.

But it is good to know that this looks like a bug, and we will keep monitoring it.

krisfed avatar Feb 02 '22 18:02 krisfed

My current hypothesis is that HDF5 is not doing a deep copy in some place involving fill values. So, this leads to freeing data that is shared between the client and the HDF5 library and that causes some kind of failure. But I cannot prove it.

DennisHeimbigner avatar Feb 02 '22 20:02 DennisHeimbigner

It sounds a little bit similar to https://github.com/Unidata/netcdf-c/issues/1985 too...

krisfed avatar Feb 24 '22 21:02 krisfed

Hi Dennis! Just wanted to check - is this still an active issue? I know there were a bunch of NC_VLEN fixes in v4.9.0, but I think you mentioned above that this specific scenario might not be addressed...

krisfed avatar Aug 25 '22 10:08 krisfed

There has been no progress on this.

DennisHeimbigner avatar Aug 25 '22 22:08 DennisHeimbigner