netcdf-c
netcdf-c copied to clipboard
Unexpected/unnecessary _Netcdf4Dimid attribute created for a non-dimscale dataset
In the simple NetCDF4 file below:
- lev is a dimension, which has a coordinate variable with the same name.
- len is a dimension, which has a non-coordinate variable with the same name.
dimensions:
lev = 2 ;
len = 4 ;
variables:
char lev(lev, len) ;
int len(lev, len) ;
There are 3 datasets in the HDF5 format file.
- lev: used as both a variable and a dimension scale
- len: used as a dimension scale only
- _nc4_non_coord_len: used as a variable only
h5dump shows that _nc4_non_coord_len dataset has an attribute named _Netcdf4Dimid
DATASET "_nc4_non_coord_len" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2, 4 ) / ( 2, 4 ) }
...
ATTRIBUTE "_Netcdf4Dimid" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 0
}
}
}
Based on my understanding, _Netcdf4Dimid attribute is created for dimension scale datasets only, which is not expected for regular datasets like _nc4_non_coord_len.
The following simple NETCDF4 test program can reproduce this issue with NetCDF-C 4.9.0
#include <stdio.h>
#include <mpi.h>
#include <netcdf.h>
#include <netcdf_par.h>
int main(int argc, char* argv[])
{
int ncid;
int dimids[2];
int varid_lev;
int varid_len;
MPI_Init(&argc, &argv);
nc_create_par("test.nc", NC_CLOBBER | NC_MPIIO | NC_NETCDF4, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid);
nc_def_dim(ncid, "lev", 2, &dimids[0]);
nc_def_dim(ncid, "len", 4, &dimids[1]);
nc_def_var(ncid, "lev", NC_CHAR, 2, dimids, &varid_lev);
nc_def_var(ncid, "len", NC_INT, 2, dimids, &varid_len);
nc_enddef(ncid);
nc_close(ncid);
MPI_Finalize();
return 0;
}
Note, if we comment out the line "nc_def_var(ncid, "lev", ...", _nc4_non_coord_len dataset does not create this unexpected _Netcdf4Dimid attribute.
What are you trying to do with the sample code?
You have dims "lev" and "len" and you create variables "lev" and "len" - but you don't expect them to be coord variables because they have number of dimensions > 1? Is that correct?
The original example is used by the following article on NetCDF-4 Dimensions and HDF5 Dimension Scales: https://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions
dimension:
nvec = 3;
time = 100;
sample = 345;
ship = 14;
ship_strlen = 80;
variable:
float data(ship, sample, time, nvec);
int time(time);
int sample(time, sample);
char ship (ship, ship_strlen);
sample is a dimension and a data variable. It's not a coordinate variable.
ship is a dimension and a 2D char coordinate variable.
Data variable sample has a name that conflicts with a dimension name. The netCDF4 library modifies the HDF5 dataset name by prepending the string nc4_non_coord, and removes this string when constructing the netCDF variable sample. The issue is, _Netcdf4Dimid attribute is not expected for the phony dataset _nc4_non_coord_sample, which is not a dimension scale.
DATASET "_nc4_non_coord_sample" {
...
ATTRIBUTE "_Netcdf4Dimid" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 1
}
}
}
Note, this issue is not reproducible if the 2D char coordinate variable ship is not defined (_Netcdf4Dimid attribute is no longer created for the phony _nc4_non_coord_sample dataset in this case).
The new example is a simplified reproducer: lev is similar to ship, and len is similar to sample.
Another example which is more close to the original one:
dimension:
time = 100;
sample = 345;
ship = 14;
ship_strlen = 80;
variable:
int sample(time, sample);
char ship (ship, ship_strlen);
It seems that this issue is also reproducible for a regular dataset without _nc4_non_coord prefix.
dimensions:
lat = 2 ;
lon = 3 ;
lev = 4 ;
variables:
int lon(lon) ;
int lat(lat) ;
int sample(lat, lev) ;
sample is not a dimension, nor is it a coordinate variable. However, unexpected/unnecessary _Netcdf4Dimid attribute is created for its dataset.
[h5dump result]
DATASET "sample" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2, 4 ) / ( 2, 4 ) }
...
ATTRIBUTE "_Netcdf4Coordinates" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 0, 2
}
}
ATTRIBUTE "_Netcdf4Dimid" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 0
}
}
}
Test program
#include <stdio.h>
#include <mpi.h>
#include <netcdf.h>
#include <netcdf_par.h>
int main(int argc, char* argv[])
{
int ncid;
int dimids[2];
int dimid_lat;
int dimid_lon;
int dimid_lev;
int varid_lat;
int varid_lon;
int varid_sample;
MPI_Init(&argc, &argv);
nc_create_par("test.nc", NC_CLOBBER | NC_MPIIO | NC_NETCDF4, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid);
nc_def_dim(ncid, "lat", 2, &dimid_lat);
nc_def_dim(ncid, "lon", 3, &dimid_lon);
nc_def_dim(ncid, "lev", 4, &dimid_lev);
nc_def_var(ncid, "lon", NC_INT, 1, &dimid_lon, &varid_lon);
nc_def_var(ncid, "lat", NC_INT, 1, &dimid_lat, &varid_lat);
dimids[0] = dimid_lat;
dimids[1] = dimid_lev;
nc_def_var(ncid, "sample", NC_INT, 2, dimids, &varid_sample);
nc_enddef(ncid);
nc_close(ncid);
MPI_Finalize();
return 0;
}
Note, if we change the order of nc_def_var calls in the above test code, this issue is not reproducible:
nc_def_var(ncid, "lat", NC_INT, 1, &dimid_lat, &varid_lat);
nc_def_var(ncid, "lon", NC_INT, 1, &dimid_lon, &varid_lon);
[h5dump result]
DATASET "sample" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2, 4 ) / ( 2, 4 ) }
...
ATTRIBUTE "_Netcdf4Coordinates" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 0, 2
}
}
}
@edwardhartnett Could you please try the updated test case when you have time? Thanks.
Yes, but I am right in the middle of major surgery on the NOAA GRIB2 libraries, so give me some time...