netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

Different behavior on ARM Mac vs Intel Mac with passing nc_def_var

Open ellenjohnson opened this issue 3 years ago • 6 comments

Hi netCDF friends! We're seeing different behavior on Apple Silicon (versus other platforms) in our netCDF test suite during negative tests of our MATLAB netcdf.defVar function (which ends up calling netCDF's nc_def_var). The short story is passing a zero as the dimids argument gives an error in Linux, Windows, and Intel Mac, but does not error on ARM Mac. We're using netCDF version 4.8.1.

This issue is related to another recent issue I posted: https://github.com/Unidata/netcdf-c/issues/2523

Our MATLAB test creates a netCDF classic file and passes a MATLAB NaN (which is defined as the IEEE arithmetic representation for NaN as a double scalar value) as the dimids argument to our netcdf.defVar function, which in turn calls our C code that interfaces with the netCDF C library and calls nc_def_var. We expect to see the NC_EBADDIM (error code -46) on Apple Silicon as we see on Linux, Windows, and Intel Mac. However, we don't get any error on Apple Silicon.

What's happening is the MATLAB NaN (double) is implicitly cast to an signed 32-bit integer in our C code. On Linux, Windows, and Intel Mac, this results in -2147483648 which is the smallest possible value for signed 32-bit integer. But on Apple Silicon, this results in zero. Most likely this is due to different architecture/compiler implementations with casting double-to-int which according to the C standard is undefined. Thus on Apple Silicon we pass a zero as the dimids argument to nc_def_var, but on the other platforms it's a negative value.

I rooted around the netCDF C library and found code in netcdf/libsrc/var.c that might be the culprit (though I may be way off): 449 for(ip = varp->dimids, op = varp->shape 450 ; ip < &varp->dimids[varp->ndims]; ip++, op++) 451 { 452 if(*ip < 0 || (size_t) (*ip) >= ((dims != NULL) ? dims->nelems : 1) ) 453 return NC_EBADDIM;

I attached my C program (as text file) demonstrating the behavior. Thanks!

netcdfReproNaNDim.txt

ellenjohnson avatar Dec 01 '22 20:12 ellenjohnson

Hi Ellen, thanks for reporting this! I haven't had a chance to look at your test file yet, but from the description you've provided, I believe your conclusion is correct. We've had similar issues in the past reported, where the casting behavior is undefined, but consistent per architecture. I'll take a look to see what we can do, and what steps we can take to fix this. Thanks!

WardF avatar Dec 01 '22 20:12 WardF

Thank you Ward!

ellenjohnson avatar Dec 01 '22 21:12 ellenjohnson

The only solution I can think of is to disallow 0 as a dimid. However, I bet that breaks some of our test cases and possibly some user's code -- thought it shouldn't because the user should not assume knowing the dimid.

DennisHeimbigner avatar Dec 01 '22 22:12 DennisHeimbigner

@DennisHeimbigner It depends if the implicit cast is happening upstream and being passed to netCDF, in which case I'm not sure there's anything we can do without introducing a sweeping change that could break the things you point out. Once I get the AGU materials finished I'll take a closer look.

WardF avatar Dec 01 '22 22:12 WardF

@WardF The implicit cast is present in the test code. 0 is being passed to the netCDF-c library.

dopplershift avatar Dec 01 '22 22:12 dopplershift