Different behavior on ARM Mac vs Intel Mac with passing nc_def_var
Hi netCDF friends! We're seeing different behavior on Apple Silicon (versus other platforms) in our netCDF test suite during negative tests of our MATLAB netcdf.defVar function (which ends up calling netCDF's nc_def_var). The short story is passing a zero as the dimids argument gives an error in Linux, Windows, and Intel Mac, but does not error on ARM Mac. We're using netCDF version 4.8.1.
This issue is related to another recent issue I posted: https://github.com/Unidata/netcdf-c/issues/2523
Our MATLAB test creates a netCDF classic file and passes a MATLAB NaN (which is defined as the IEEE arithmetic representation for NaN as a double scalar value) as the dimids argument to our netcdf.defVar function, which in turn calls our C code that interfaces with the netCDF C library and calls nc_def_var. We expect to see the NC_EBADDIM (error code -46) on Apple Silicon as we see on Linux, Windows, and Intel Mac. However, we don't get any error on Apple Silicon.
What's happening is the MATLAB NaN (double) is implicitly cast to an signed 32-bit integer in our C code. On Linux, Windows, and Intel Mac, this results in -2147483648 which is the smallest possible value for signed 32-bit integer. But on Apple Silicon, this results in zero. Most likely this is due to different architecture/compiler implementations with casting double-to-int which according to the C standard is undefined. Thus on Apple Silicon we pass a zero as the dimids argument to nc_def_var, but on the other platforms it's a negative value.
I rooted around the netCDF C library and found code in netcdf/libsrc/var.c that might be the culprit (though I may be way off): 449 for(ip = varp->dimids, op = varp->shape 450 ; ip < &varp->dimids[varp->ndims]; ip++, op++) 451 { 452 if(*ip < 0 || (size_t) (*ip) >= ((dims != NULL) ? dims->nelems : 1) ) 453 return NC_EBADDIM;
I attached my C program (as text file) demonstrating the behavior. Thanks!
Hi Ellen, thanks for reporting this! I haven't had a chance to look at your test file yet, but from the description you've provided, I believe your conclusion is correct. We've had similar issues in the past reported, where the casting behavior is undefined, but consistent per architecture. I'll take a look to see what we can do, and what steps we can take to fix this. Thanks!
Thank you Ward!
The only solution I can think of is to disallow 0 as a dimid. However, I bet that breaks some of our test cases and possibly some user's code -- thought it shouldn't because the user should not assume knowing the dimid.
@DennisHeimbigner It depends if the implicit cast is happening upstream and being passed to netCDF, in which case I'm not sure there's anything we can do without introducing a sweeping change that could break the things you point out. Once I get the AGU materials finished I'll take a closer look.
@WardF The implicit cast is present in the test code. 0 is being passed to the netCDF-c library.