netcdf-c
netcdf-c copied to clipboard
NetCDF 4.9.0 fails to create a NetCDF4 file on Windows (x86_64-w64-mingw32-gcc) with HDF5 1.12.1
- the version of the software with which you are encountering an issue
NetCDF 4.8.1
- environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
Windows, mingw compiler (x86_64-w64-mingw32-gcc (GCC) 4.8.5)
- a description of the issue with the steps needed to reproduce it
NetCDF 4.8.1 fails to create a NetCDF4 files on Windows with HDF5 1.12.1 (binary from mingw).
The issue has been reported here (in the context of julia) by @visr https://github.com/Alexander-Barth/NCDatasets.jl/issues/164
The julia code in the issue correspond to the following C code:
retval = nc_create("test.nc4", NC_NETCDF4, &ncid)
So just creating a NetCDF4 file triggers the issue.
The error message is:
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7905cc0c -- nc4_create_file at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:124
in expression starting at C:\Users\visser_mn\.julia\packages\NCDatasets\TCrQh\test\runtests.jl:10
nc4_create_file at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:124
NC4_create at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:313
NC_create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:1926
nc__create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:464
nc_create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:391
nc_create at C:\Users\visser_mn\.julia\packages\NCDatasets\TCrQh\src\netcdf_c.jl:255
[...]
This corresponds to the following line: https://github.com/Unidata/netcdf-c/blob/v4.8.1/libhdf5/hdf5create.c#L124
To build NetCDF4 on Windows I have to apply these patches: https://github.com/Alexander-Barth/Yggdrasil/tree/NetCDF-v4.8.1/N/NetCDF/bundled/patches
This first patch is based on https://github.com/Unidata/netcdf-c/pull/2138.
NetCDF 4.8.1 works on all other tested platforms (Linux, Mac OS, even Mac OS-M1). We had also this issue with NetCDF 4.7.4 and HDF5 1.12.1 on Windows.
In julia, all libraries are cross-compiled from a Linux-x86_64 environment targeting the different OS and CPU architectures. I am not sure where actually the problem is. It could also be in HDF5 , the mingw compiler, ...
Any help would be greatly appreciated :-)
As a Linux user, I am not too familiar with Windows. I just want to get our software to work for our students which are primarily Windows users.
Ref: https://github.com/Alexander-Barth/NCDatasets.jl/issues/164 https://github.com/JuliaPackaging/Yggdrasil/issues/4511 https://github.com/JuliaGeo/NetCDF.jl/issues/151 https://github.com/Unidata/netcdf-c/pull/2138 https://github.com/Unidata/netcdf-c/issues/2124
CC: @visr, @giordano
Thank you for the very comprehensive report! I'll take a look, I am working primarily on Linux/OSX myself, with Windows being a secondary environment. I will try to duplicate this environment and issue, and hopefully being able to provide some insight. Hopefully I will be able to track this down without going into Julia, as I unfortunately have no experience there.
Could this be a file permission problem? Do you have permission to create the file, or is there already a file of that name that you are attempting to overwrite without using NC_CLOBBER?
In our case, we have this issue also with a randomly created filenames in the windows temporary directory (windows equivalent of /tmp) when running our test suite. I guess that @visr also checked that the file test.nc4 did not exist in his tests.
Yes indeed, I don't think it is a file permission issue. The filename didn't exist, and I could create a netcdf3-classic file in the same place with the same library.
Just noting that we still see this on netCDF 4.9.0 built against HDF5 1.12.2, ref https://github.com/Alexander-Barth/NCDatasets.jl/issues/164#issuecomment-1202094798, only when cross compiling netCDF for Windows.
As a test, I added a simple test function to libnetcdf which only creates a HDF5 File Access Properties list:
int my_test_function() {
hid_t fapl_id = -1;
int retval = NC_NOERR;
printf("start\n");
fapl_id = H5Pcreate(H5P_FILE_ACCESS);
printf("end\n");
return retval;
}
Calling this function reproduces this crash:
$ /c/Users/Alexander\ Barth/AppData/Local/Programs/Julia-1.8.0-rc3/bin/julia.exe --eval ' using NetCDF_jll; ccall((:my_test_function, libnetcdf), Cint, ())'
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1758 -- my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
in expression starting at none:1
my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
top-level scope at .\none:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
exec_options at .\client.jl:276
_start at .\client.jl:522
jfptr__start_37025.clone_1 at C:\Users\Alexander Barth\AppData\Local\Programs\Julia-1.8.0-rc3\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 2903 (Pool: 2891; Big: 12); GC: 0
start
Updated issue title to reflect it is still observed in latest version.
As a test, I added a simple test function to libnetcdf which only creates a HDF5 File Access Properties list:
int my_test_function() { hid_t fapl_id = -1; int retval = NC_NOERR; printf("start\n"); fapl_id = H5Pcreate(H5P_FILE_ACCESS); printf("end\n"); return retval; }Calling this function reproduces this crash:
$ /c/Users/Alexander\ Barth/AppData/Local/Programs/Julia-1.8.0-rc3/bin/julia.exe --eval ' using NetCDF_jll; ccall((:my_test_function, libnetcdf), Cint, ())' Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks. Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1758 -- my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45 in expression starting at none:1 my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45 top-level scope at .\none:1 jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897 jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850 jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850 ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined] ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965 eval at .\boot.jl:368 [inlined] exec_options at .\client.jl:276 _start at .\client.jl:522 jfptr__start_37025.clone_1 at C:\Users\Alexander Barth\AppData\Local\Programs\Julia-1.8.0-rc3\lib\julia\sys.dll (unknown line) jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined] true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575 jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719 mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59 BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line) RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line) Allocations: 2903 (Pool: 2891; Big: 12); GC: 0 start
This adds an interesting wrinkle, insofar as this is an HDF5 function being called absent any netCDF function calls. Because it's not completely separated from libnetcdf I can't say with absolute certainty that it's an HDF5 issue, but given the simple nature of the test program and the reliance on pure HDF5 code (the single HDF5 function call), it would suggest to me that this is an issue in libhdf5.
I'm not unfamiliar with cross-compilation, but it is not part of my regular workflow. Let me take a look and see if I can replicate this in a stand-alone test program that only uses hdf5 and is also cross-compiled. I hate to ask, but if this is something you can easily test on your end, @Alexander-Barth, it might be worth doing so so that we can really nail down whether this is in the netCDF layer or not.
Indeed, I was wondering the same thing:
https://github.com/JuliaPackaging/Yggdrasil/issues/4511#issuecomment-1207107732
The small example program using just HDF5 also failed when using the gcc compiler from the julia build environment (x86_64-w64-mingw32-gcc 4.8.5). Surprisingly, I can cross-compile the example program using the cross-compiler from Ubuntu 20.04 (with a more up-to-date version "9.3-win32 20200320").
So I am wondering if this could be compiler bug triggered by some changes in HDF5 since version 1.12.1 (completely unrelated to NetCDF).
I am closing this issue since the problem did not show up again after upgrading the GCC version. Thanks to all who contributed to the discussion!