netcdf-c
netcdf-c copied to clipboard
HDF error on reading back NC_VLEN variable with fill value and chunking
We are using netcdf-c 4.8.1 and seeing an HDF error when using nc_get_var on an NC_VLEN variable that has (1) a fill value set, (2) not all elements filled, and (3) some chunking applied.
Not sure if this is expected behavior and the applied chunking or some other part of the process is incorrect (but then shouldn't it error out on writing, not reading?). Or does this look like a bug?
Here is some simplistic reproduction code. Here I have an NC_VLEN (of NC_DOUBLEs) variable with one dimension of size 4, and I am only writing the first 2 elements of it. There is fill value (set as {0, 101}) and chunking (set to 1).
#include <iostream>
#include "netcdf.h"
void checkErrorCode(int status, const char* message){
if (status != NC_NOERR){
std::cout << "Error code: " << status << " from " << message << std::endl;
std::cout << nc_strerror(status) << std::endl << std::endl;
}
}
int main(int argc, const char * argv[]) {
// ================ WRITE ==================
// Setup data
const size_t DATA_LENGTH = 4;
nc_vlen_t data[DATA_LENGTH];
const int first_size = 2;
double first[first_size] = {2, 5};
data[0].p = first;
data[0].len = first_size;
const int second_size = 3;
double second[second_size] = {88, 96, 42};
data[1].p = second;
data[1].len = second_size;
// Open file
int ncid;
int retval;
retval = nc_create("vlenFillValue.nc", NC_NETCDF4, &ncid);
checkErrorCode(retval, "nc_create");
// Define vlen type named RAGGED_DOUBLE
nc_type vlen_typeID;
retval = nc_def_vlen(ncid, "RAGGED_DOUBLE", NC_DOUBLE, &vlen_typeID);
checkErrorCode(retval, "nc_def_vlen");
// Define dimension
int dimid;
retval = nc_def_dim(ncid, "xdim", DATA_LENGTH, &dimid);
checkErrorCode(retval, "nc_def_dim");
// Define vlen variable
int varid;
retval = nc_def_var(ncid, "var", vlen_typeID, 1, &dimid, &varid);
checkErrorCode(retval, "nc_def_var");
// Define chunking
const size_t chunk = 1; //error also with 3
retval = nc_def_var_chunking(ncid, varid, NC_CHUNKED, &chunk);
checkErrorCode(retval, "nc_def_var_chunking");
// Define fill value
nc_vlen_t fillValue;
double fv[2] = {0, 101};
fillValue.p = fv;
fillValue.len = 2;
retval = nc_def_var_fill(ncid, varid, NC_FILL, &fillValue);
checkErrorCode(retval, "nc_def_var_fill");
// Write vlen variable
size_t start = 0;
size_t count = 2;
retval = nc_put_vara(ncid, varid, &start, &count, data);
checkErrorCode(retval, "nc_put_vara");
retval = nc_close(ncid);
checkErrorCode(retval, "nc_close (1)");
// ================ READ ==================
// open file
retval = nc_open("vlenFillValue.nc", NC_NOWRITE, &ncid);
checkErrorCode(retval, "nc_open");
nc_vlen_t* data_read = new nc_vlen_t[DATA_LENGTH];
retval = nc_get_var(ncid, varid, data_read);
checkErrorCode(retval, "nc_get_var");
retval = nc_close(ncid);
checkErrorCode(retval, "nc_close (2)");
return retval;
}
Here is the output (this was run on macOS 11.2.3, but we see the issue on other OS's too):
$ ./a.out
Error code: -101 from nc_get_var
NetCDF: HDF error
I see that ncdump also errors out on the produced file:
$ ncdump vlenFillValue.nc
netcdf vlenFillValue {
types:
double(*) RAGGED_DOUBLE ;
dimensions:
xdim = 4 ;
variables:
RAGGED_DOUBLE var(xdim) ;
RAGGED_DOUBLE var:_FillValue = {0, 101} ;
data:
NetCDF: HDF error
You might look at the conversation associated with this: https://github.com/Unidata/netcdf-c/pull/2179 In particular, your example looks like the known bug #1.
Thank you Dennis! It does look potentially related to the mentioned known bug about NC_VLEN and fill values in #2179, although I am not sure what kind of failures were observed there (errors/crash, on reading/writing, etc). Another issue possibly related to the same NC_VLEN/FillValue problem is how the crash in https://github.com/Unidata/netcdf-c/issues/2181# persists even when the data was "zeroed out" if a fill value was used.
But it is good to know that this looks like a bug, and we will keep monitoring it.
My current hypothesis is that HDF5 is not doing a deep copy in some place involving fill values. So, this leads to freeing data that is shared between the client and the HDF5 library and that causes some kind of failure. But I cannot prove it.
It sounds a little bit similar to https://github.com/Unidata/netcdf-c/issues/1985 too...
Hi Dennis! Just wanted to check - is this still an active issue? I know there were a bunch of NC_VLEN fixes in v4.9.0, but I think you mentioned above that this specific scenario might not be addressed...
There has been no progress on this.