netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

Unexpected/unnecessary _Netcdf4Dimid attribute created for a non-dimscale dataset

Open dqwu opened this issue 2 years ago • 5 comments

In the simple NetCDF4 file below:

  • lev is a dimension, which has a coordinate variable with the same name.
  • len is a dimension, which has a non-coordinate variable with the same name.
dimensions:
	lev = 2 ;
	len = 4 ;
variables:
	char lev(lev, len) ;
	int len(lev, len) ;

There are 3 datasets in the HDF5 format file.

  • lev: used as both a variable and a dimension scale
  • len: used as a dimension scale only
  • _nc4_non_coord_len: used as a variable only

h5dump shows that _nc4_non_coord_len dataset has an attribute named _Netcdf4Dimid

   DATASET "_nc4_non_coord_len" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 2, 4 ) / ( 2, 4 ) }
      ...
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
   }

Based on my understanding, _Netcdf4Dimid attribute is created for dimension scale datasets only, which is not expected for regular datasets like _nc4_non_coord_len.

The following simple NETCDF4 test program can reproduce this issue with NetCDF-C 4.9.0

#include <stdio.h>
#include <mpi.h>
#include <netcdf.h>
#include <netcdf_par.h>

int main(int argc, char* argv[])
{
  int ncid;
  int dimids[2];
  int varid_lev;
  int varid_len;

  MPI_Init(&argc, &argv);

  nc_create_par("test.nc", NC_CLOBBER | NC_MPIIO | NC_NETCDF4, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid);

  nc_def_dim(ncid, "lev", 2, &dimids[0]);
  nc_def_dim(ncid, "len", 4, &dimids[1]);

  nc_def_var(ncid, "lev", NC_CHAR, 2, dimids, &varid_lev);
  nc_def_var(ncid, "len", NC_INT, 2, dimids, &varid_len);

  nc_enddef(ncid);

  nc_close(ncid);

  MPI_Finalize();

  return 0;
}

Note, if we comment out the line "nc_def_var(ncid, "lev", ...", _nc4_non_coord_len dataset does not create this unexpected _Netcdf4Dimid attribute.

dqwu avatar Aug 11 '22 23:08 dqwu

What are you trying to do with the sample code?

You have dims "lev" and "len" and you create variables "lev" and "len" - but you don't expect them to be coord variables because they have number of dimensions > 1? Is that correct?

edwardhartnett avatar Aug 12 '22 12:08 edwardhartnett

The original example is used by the following article on NetCDF-4 Dimensions and HDF5 Dimension Scales: https://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions

   dimension:
      nvec = 3;
      time = 100;
      sample = 345;
      ship = 14;
      ship_strlen = 80;
   variable:
     float data(ship, sample, time, nvec);
     int time(time);
     int sample(time, sample);
     char ship (ship, ship_strlen);

sample is a dimension and a data variable. It's not a coordinate variable.

ship is a dimension and a 2D char coordinate variable.

Data variable sample has a name that conflicts with a dimension name. The netCDF4 library modifies the HDF5 dataset name by prepending the string nc4_non_coord, and removes this string when constructing the netCDF variable sample. The issue is, _Netcdf4Dimid attribute is not expected for the phony dataset _nc4_non_coord_sample, which is not a dimension scale.

   DATASET "_nc4_non_coord_sample" {
      ...
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 1
         }
      }
   }

Note, this issue is not reproducible if the 2D char coordinate variable ship is not defined (_Netcdf4Dimid attribute is no longer created for the phony _nc4_non_coord_sample dataset in this case).

The new example is a simplified reproducer: lev is similar to ship, and len is similar to sample.

Another example which is more close to the original one:

   dimension:
      time = 100;
      sample = 345;
      ship = 14;
      ship_strlen = 80;
   variable:
     int sample(time, sample);
     char ship (ship, ship_strlen);

dqwu avatar Aug 12 '22 14:08 dqwu

It seems that this issue is also reproducible for a regular dataset without _nc4_non_coord prefix.

dimensions:
	lat = 2 ;
	lon = 3 ;
	lev = 4 ;
variables:
	int lon(lon) ;
	int lat(lat) ;
	int sample(lat, lev) ;

sample is not a dimension, nor is it a coordinate variable. However, unexpected/unnecessary _Netcdf4Dimid attribute is created for its dataset.

[h5dump result]

   DATASET "sample" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 2, 4 ) / ( 2, 4 ) }
      ...
      ATTRIBUTE "_Netcdf4Coordinates" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
         DATA {
         (0): 0, 2
         }
      }
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
   }

Test program

#include <stdio.h>
#include <mpi.h>
#include <netcdf.h>
#include <netcdf_par.h>

int main(int argc, char* argv[])
{
  int ncid;
  int dimids[2];
  int dimid_lat;
  int dimid_lon;
  int dimid_lev;
  int varid_lat;
  int varid_lon;
  int varid_sample;

  MPI_Init(&argc, &argv);

  nc_create_par("test.nc", NC_CLOBBER | NC_MPIIO | NC_NETCDF4, MPI_COMM_WORLD, MPI_INFO_NULL, &ncid);

  nc_def_dim(ncid, "lat", 2, &dimid_lat);
  nc_def_dim(ncid, "lon", 3, &dimid_lon);
  nc_def_dim(ncid, "lev", 4, &dimid_lev);

  nc_def_var(ncid, "lon", NC_INT, 1, &dimid_lon, &varid_lon);
  nc_def_var(ncid, "lat", NC_INT, 1, &dimid_lat, &varid_lat);

  dimids[0] = dimid_lat;
  dimids[1] = dimid_lev;
  nc_def_var(ncid, "sample", NC_INT, 2, dimids, &varid_sample);

  nc_enddef(ncid);

  nc_close(ncid);

  MPI_Finalize();

  return 0;
}

Note, if we change the order of nc_def_var calls in the above test code, this issue is not reproducible:

  nc_def_var(ncid, "lat", NC_INT, 1, &dimid_lat, &varid_lat);
  nc_def_var(ncid, "lon", NC_INT, 1, &dimid_lon, &varid_lon);

[h5dump result]

   DATASET "sample" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 2, 4 ) / ( 2, 4 ) }
      ...
      ATTRIBUTE "_Netcdf4Coordinates" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
         DATA {
         (0): 0, 2
         }
      }
   }

dqwu avatar Aug 13 '22 01:08 dqwu

@edwardhartnett Could you please try the updated test case when you have time? Thanks.

dqwu avatar Aug 13 '22 01:08 dqwu

Yes, but I am right in the middle of major surgery on the NOAA GRIB2 libraries, so give me some time...

edwardhartnett avatar Aug 18 '22 20:08 edwardhartnett