netcdf-c
netcdf-c copied to clipboard
Current issues renaming dimensions and coordinates in netCDF4 files
Old issues with renaming dimensions and coordinates in netCDF4 files are beginning to affect CMIP6 processing so here's a reminder. All recent library versions are affected. See here for a chronology. Create a netCDF4 file with a coordinate:
netcdf bug_rnm {
dimensions:
lon = 4 ;
variables:
float lon(lon) ;
data:
lon = 0, 90, 180, 270 ;
}
Remember to re-create the file after each of the three tests below. Renaming both dimension and coordinate together works. Yay! This surprises me, given the history recounted above:
ncrename -d lon,longitude -v lon,longitude ~/bug_rnm.nc # works
Renaming dimension then coordinate separately fails:
ncrename -d lon,longitude ~/bug_rnm.nc
ncrename -v lon,longitude ~/bug_rnm.nc # borken "HDF error"
Renaming coordinate then dimension separately fails:
ncrename -v lon,longitude ~/bug_rnm.nc
ncrename -d lon,longitude ~/bug_rnm.nc # borken "nc4_reform_coord_var: Assertion `dim_datasetid > 0' failed."
Thanks @czender we were flooded with other PR's and I'm trying to manage those combined with the other open issues, we want to get the next bugfix release out shortly, so trying to address the issues you have open right now (in addition to working on some of the PR's Dennis has for compression, etc).
On Mon, Nov 6, 2017 at 8:52 PM, Charlie Zender [email protected] wrote:
Old issues with renaming dimensions and coordinates in netCDF4 files are beginning to affect CMIP6 processing so here's a reminder. All recent library versions are affected. See here http://nco.sf.net/nco.html#bug_nc4_rename for a chronology. Create a netCDF4 file with a coordinate:
netcdf bug_rnm { dimensions: lon = 4 ; variables: float lon(lon) ; data: lon = 0, 90, 180, 270 ; }
Remember to re-create the file after each of the three tests below. Renaming both dimension and coordinate together works. Yay! This surprises me, given the history recounted above:
ncrename -d lon,longitude -v lon,longitude ~/bug_rnm.nc # works
Renaming dimension then coordinate separately fails:
ncrename -d lon,longitude ~/bug_rnm.nc ncrename -v lon,longitude ~/bug_rnm.nc # borken "HDF error"Renaming dimension then coordinate separately fails:
ncrename -v lon,longitude ~/bug_rnm.nc ncrename -d lon,longitude ~/bug_rnm.nc # borken "nc4_reform_coord_var: Assertion `dim_datasetid > 0' failed."— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf-c/issues/597, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH-UmNZUGFrWoYswzAw4jyP-JQVCt1Cks5sz9PzgaJpZM4QUOl- .
Howdy @czender !
I found a test written by Quincey that deals with renaming, nc_test4/tst_rename.c.
I have add to the existing tests, and reproduced your error in C code. Here's the test:
fprintf(stderr,"*** Test Charlie's test for renaming...");
{
#define CHARLIE_TEST_FILE "nc_rename_coord_dim.nc"
#define DIM1_NAME "lon"
#define VAR1_NAME "lon"
#define DIM2_NAME "longitude"
#define VAR2_NAME "longitude"
#define DIM1_LEN 4
#define NDIM1 1
int ncid, dimid, varid;
float data[DIM1_LEN] = {0, 90.0, 180.0, 270.0};
if (nc_create(CHARLIE_TEST_FILE, NC_NETCDF4, &ncid)) ERR;
if (nc_def_dim(ncid, DIM1_NAME, DIM1_LEN, &dimid)) ERR;
if (nc_def_var(ncid, VAR1_NAME, NC_FLOAT, NDIM1, &dimid, &varid)) ERR;
if (nc_enddef(ncid)) ERR;
if (nc_put_var_float(ncid, varid, data)) ERR;
if (nc_close(ncid)) ERR;
/* Reopen the file to check. */
if (check_charlies_file(CHARLIE_TEST_FILE, DIM1_NAME, VAR1_NAME)) ERR;
/* Open the file and rename the dimension. */
if (nc_open(CHARLIE_TEST_FILE, NC_WRITE, &ncid)) ERR;
if (nc_redef(ncid)) ERR;
if (nc_rename_dim(ncid, 0, DIM2_NAME)) ERR;
if (nc_enddef(ncid)) ERR;
if (nc_close(ncid)) ERR;
/* Reopen the file to check. */
if (check_charlies_file(CHARLIE_TEST_FILE, DIM2_NAME, VAR1_NAME)) ERR;
/* Open the file and rename the variable. */
if (nc_open(CHARLIE_TEST_FILE, NC_WRITE, &ncid)) ERR;
if (nc_redef(ncid)) ERR;
if (nc_rename_var(ncid, 0, VAR2_NAME)) ERR;
if (nc_enddef(ncid)) ERR;
if (nc_close(ncid)) ERR;
/* Reopen the file to check. */
if (check_charlies_file(CHARLIE_TEST_FILE, VAR2_NAME, VAR1_NAME)) ERR;
This fails like this:
*** Test Charlie's test for renaming...HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:
#000: H5Gdeprec.c line 459 in H5Gmove(): couldn't move link
major: Links
minor: Unable to initialize object
#001: H5Gdeprec.c line 541 in H5G_move(): unable to move link
major: Links
minor: Can't move object
#002: H5L.c line 2763 in H5L_move(): unable to find link
major: Symbol table
minor: Object not found
#003: H5Gtraverse.c line 867 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#004: H5Gtraverse.c line 639 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#005: H5L.c line 2623 in H5L_move_cb(): unable to follow symbolic link
major: Symbol table
minor: Object not found
#006: H5Gtraverse.c line 867 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#007: H5Gtraverse.c line 639 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#008: H5L.c line 2484 in H5L_move_dest_cb(): an object with that name already exists
major: Symbol table
minor: Object not found
Sorry! Unexpected result, tst_rename.c, line: 301
I will take a look and see what is going on here...
OK, the problem can be seen from the h5dump:
HDF5 "nc_rename_coord_dim.nc" {
GROUP "/" {
ATTRIBUTE "_NCProperties" {
DATATYPE H5T_STRING {
STRSIZE 67;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "version=1|netcdflibversion=4.5.1-development|hdf5libversion=1.10.1"
}
}
DATASET "lon" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
DATA {
(0): 0, 90, 180, 270
}
ATTRIBUTE "DIMENSION_LIST" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): (DATASET 659 /longitude )
}
}
ATTRIBUTE "_Netcdf4Dimid" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 0
}
}
}
DATASET "longitude" {
DATATYPE H5T_IEEE_F32BE
DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
DATA {
(0): 0, 0, 0, 0
}
ATTRIBUTE "CLASS" {
DATATYPE H5T_STRING {
STRSIZE 16;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "DIMENSION_SCALE"
}
}
ATTRIBUTE "NAME" {
DATATYPE H5T_STRING {
STRSIZE 64;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "This is a netCDF dimension but not a netCDF variable. 4"
}
}
ATTRIBUTE "REFERENCE_LIST" {
DATATYPE H5T_COMPOUND {
H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
H5T_STD_I32LE "dimension";
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): {
DATASET 331 /lon ,
0
}
}
}
}
}
}
When the code hits this line in NC4_rename_var() it fails:
/* Change the HDF5 file, if this var has already been created
there. */
if (var->created)
{
if (H5Gmove(grp->hdf_grpid, var->name, name) < 0)
BAIL(NC_EHDFERR);
}
So the code is trying to rename the dataset with H5Gmove, but can't because there is already a dataset with the new name. It was created to support the dimension name change - its one of those dreaded secret datasets to support HDF5 dimension scales.
I guess the way to proceed is to check before the H5Gmove to see if there is another var of the desired name which secretly exists, and then delete it, and then make the newly renamed dataset into a dimension scale.
I will try to take a stab at that...
Thanks, @edhartnett. It sounds like an intricate fix, so I hope you #persist.
Thanks, All!
This is happening on this branch: https://github.com/NetCDF-World-Domination-Council/netcdf-c/tree/ejh_zender_coordinate_var_rename
So far I have just added the new test that demonstrates the problem.
@czender I am working on this now as part of my current branch. I have been sneaking up on this by writing tests for lots of simpler code in nc4var.c. Now I have done the easy stuff and am moving on to the rename bugs.
To help us understand the urgency of this fix, if any, can you elaborate on how it is being used in CMIP6? And what CMIP6 is?
Are we talking world-shatteringly important climate research here? Or what?
And why the heck are you renaming vars anyway? Can't you scientists just make up your minds? What is the use-case that is driving this? Can you help me understand the importance of this feature?
Thanks!
PS I am going to fix it anyway, obviously. You would have to physically restrain me to stop me from fixing it. I am just curious why you need it.
CMIP6 is the large international climate model intercomparison for the next IPCC report. The CMIP6 organizers provided data in netcdf files for all of the different modeling groups to use. However, each climate model typically looks for a particular name for each input variable it needs, which varies from model to model (eg, sea_surface_temperature, SST, surfTemp), and there are hundreds of these, especially when we have to reprocess our output to match the CMIP6 required variable names. Hence the need to rename the variables.
The good news is that I have already worked around this bug (it was too important to wait), but it will be good to get it fixed for the future.
I actually use the NCO utilities, so this is a question for @czender .
Ed,In retrospect, I sure wih you guys had not been seduced by dimension scales :-)
Mea culpa, mea culpa, mea maxima culpa.
@cameronsmith1 thanks for the explanation.
What workaround are you using?
I ended up using the nco utilities of @czender in a different way, so I am not sure how to express it in native HDF commands.
I receive a few bug reports a year from people trying to rename coordinates in netCDF4 files with ncrename. CMIP does this on an industrial scale. Most people end up using the multistep workaround(s) documented in the NCO manual here.
OK I have a fix for this problem, as well as several other rename problems. I am still working on another one I found.
There is a bunch of state code in this which is not helpful. For example there are these values in NC_VAR_INFO_T:
nc_bool_t is_new_var; /* True if variable is newly created */
nc_bool_t was_coord_var; /* True if variable was a coordinate var, but either the dim or var has been renamed */
nc_bool_t became_coord_var; /* True if variable _became_ a coordinate var, because either the dim or var has been renamed */
These are set at various times in nc_rename_var() and nc_rename_dim().
Then in the sync that is called during nc_enddef() these values are consulted and various actions taken.
Problem is, it's not hard to come up with combinations of rename calls which confuse the crap out of the code. There are so many possible ways that users can name and rename vars and dims, with enddefs/redefs thrown in or not, It gets very, very confusing.
What would work better would be to assume a statelessness. Whenever a rename command is complete, the file on disk and in metadata must be completely adjusted and in a complete state. There must be nothing left over of the rename process for enddef to do. All work must be done in nc_rename_var() and nc_rename_dim().
If that were true, then the user could call them in any wild order, and it would not matter to us.
I am not attempting to take these out right now. I am patching the existing code so that it works for the test cases that we've identified, including Charlie's. But I suspect there are many more bugs out there with the existing code that will only be eliminated by removing these state variables and dealing with everything in the rename.
OK, these fixes are up as PR #755. It's branch ejh_fill_values on my netcdf clone: [email protected]:NetCDF-World-Domination-Council/netcdf-c.git
@czender if you could grab the branch and give it a try, that would be great. I don't think I've fixed every possible rename problem, but see if your common cases now work OK.
Any problems would be best expressed by adding a test to nc_test4/tst_rename.c. Simply copy an existing test case, then change it to make it fail, and send me the code or post it here.
Thanks for working on this. At the web-page that @czender mentioned, there is a long list of problems over the years with renaming variables, which supports your conclusion that the current implementation is fragile against the many possible renaming situations. Interestingly, @czender notes that a robust solution is to convert from netcdf4 to netcdf3, then rename and convert back to netcdf4. Does the implementation of rename for netcdf3 offer a useful template?
Thanks, @edhartnett. I will try to test this soon and report the results here.
@czender you should actually use branch currently up as PR #763.
This PR contains just the rename fixes from the other PR, now closed without merging.
In tst_rename.c there are some commented-out lines which show an unfixed bug:
/* Open the file and rename the variable. This will remove
* the dimscale-only dataset "longitude" and rename the
* extisting dataset "lon" to "longitude". Variable
* "longitude" will become a coordinate var. */
/* if (nc_open(CHARLIE_TEST_FILE, NC_WRITE, &ncid)) ERR; */
/* if (nc_redef(ncid)) ERR; */
/* if (nc_rename_var(ncid, 0, LONGITUDE)) ERR; */
/* if (nc_enddef(ncid)) ERR; */
/* if (nc_close(ncid)) ERR; */
Still working on this one. ;-)
I got as far as re-building and testing the latest netCDF master four days ago. It produced a total of 7 new and unexpected failures with the NCO regression test, half with ncks and half with ncpdq. All new failures involved netCDF4 files. I have not yet figured out the exact cause. Tracking that down has higher priority than testing the rename fix for now.
Well it might be a good idea for us to use the NCO tests as another set of testing for netCDF...
On Tue, Jan 9, 2018 at 12:27 PM, Charlie Zender [email protected] wrote:
I got as far as re-building and testing the latest netCDF master four days ago. It produced a total of 7 new and unexpected failures with the NCO regression test, half with ncks and half with ncpdq. All new failures involved netCDF4 files. I have not yet figured out the exact cause. Tracking that down has higher priority than testing the rename fix for now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf-c/issues/597#issuecomment-356388116, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUr3EJohcZQc3k1PnO5TtXCfr2bvTYuks5tI72igaJpZM4QUOl- .
NCO is extremely widely used in the climate community for working with netcdf files, so including them in your test suite would be wonderful.
We already do as part of our testing. Unfortunately there are two sets of tests: ‘make check’ which will return non-zero when there is a failure, and ‘make test’ which requires manual intervention to catch failures. We run the firmer but not the latter.
@czender are you seeing failures in make check or make test? I am not seeing failures with make check. I will check with make test after my poster session.
As you suspect it is with "make test", which is always what I mean by "the NCO regression test". For historical reasons, NCO's "make check" is basically just a check that the NCO C++ code library links and runs a single, simple program.
After a little investigation it turns out that this was a problem with "make test" relying on a file, hdn.nc, that autoconf apparently does not re-create, it is not a netCDF or NCO code issue. Phew. Now I can move on to testing the rename patch...
Excellent! I will hold off, let me know if any regressions pop up. Thank you @czender!
@czender I've just put up another PR (#830) which I believe fixes the test case that you gave me for this issue.
Not sure if all possible rename issues are solved. But if you could try branch ejh_rename_is_here from https://github.com/NetCDF-World-Domination-Council/netcdf-c, and let me know if it now passes your rename tests, that would be great.
Of course, if it does not, I will need the next test case where it fails. ;-)
Thank you for your edforts. It continues to fail on the test previously reported.
zender@skyglow:~$ ncks --lbr
Linked to netCDF library version 4.6.0.1 compiled Feb 1 2018 06:23:57
zender@skyglow:~$ ncrename -O -d lev,z -d lat,y -d lon,x ~/nco/data/in_grp.nc ~/foo.nc
zender@skyglow:~$ ncks -H --trd -s %d -v one ~/foo.nc
ncks: INFO nco_bld_dmn_ids_trv() reports variable </g5/one_dmn_rec_var> with duplicate dimensions
ncks: ERROR netCDF file with duplicate dimension IDs detected. Please use netCDF version at least 4.3.0. NB: Simultaneously renaming multiple dimensions with ncrename can trigger this bug with netCDF versions up to 4.6.0.1 (current as of 20180201).
ncks: INFO reports group information
/: 11 subgroups, 7 dimensions, 1 record dimensions, 5 attributes, 18 variables
#3 dimension: 'coord'(2)
#4 record dimension: 'time'(10)
#5 dimension: 'vrt_nbr'(2)
#6 dimension: 'gds_crd'(8)
#7 dimension: 'y'(2)
#8 dimension: 'x'(4)
#9 dimension: 'z'(3)
/g1: 2 subgroups, 0 dimensions, 0 record dimensions, 1 attributes, 8 variables
/g1/g1g1: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 5 variables
/g1/g1:g2: 0 subgroups, 0 dimensions, 0 record dimensions, 2 attributes, 0 variables
/g2: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 3 variables
/g4: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 4 variables
/g5: 0 subgroups, 2 dimensions, 2 record dimensions, 0 attributes, 3 variables
#7 record dimension: 'time51'(2)
#8 record dimension: 'time52'(10)
/g6: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 2 variables
/g6/g6g1: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 1 variables
/g7: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 6 variables
/g7/g7g1: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 1 variables
/g8: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g8/g8g1: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g8/g8g1/g8g1g1: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g8/g8g1/g8g1g1/g8g1g1g1: 0 subgroups, 0 dimensions, 0 record dimensions, 2 attributes, 0 variables
/g9: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g9/g9g1: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 1 variables
/g9/g9g1/g9g1g1: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g9/g9g1/g9g1g1/g9g1g1g1: 1 subgroups, 0 dimensions, 0 record dimensions, 1 attributes, 0 variables
/g9/g9g1/g9g1g1/g9g1g1g1/g9g1g1g1g1: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g9/g9g1/g9g1g1/g9g1g1g1/g9g1g1g1g1/g9g1g1g1g1g1: 1 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 0 variables
/g9/g9g1/g9g1g1/g9g1g1g1/g9g1g1g1g1/g9g1g1g1g1g1/g9g1g1g1g1g1g1: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 4 variables
/g10: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 2 variables
/g11: 0 subgroups, 0 dimensions, 0 record dimensions, 0 attributes, 17 variables
/g12: 0 subgroups, 0 dimensions, 0 record dimensions, 20 attributes, 0 variables
ncks: INFO reports variable information
Segmentation fault (core dumped)
Can you build with --enable-logging, and then call nc_set_log_level(5) before running those commands?
I have tried to build NCO here but could not get it to work.