netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

NetCDF 4.9.0 fails to create a NetCDF4 file on Windows (x86_64-w64-mingw32-gcc) with HDF5 1.12.1

Open Alexander-Barth opened this issue 3 years ago • 9 comments

  • the version of the software with which you are encountering an issue

NetCDF 4.8.1

  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.)

Windows, mingw compiler (x86_64-w64-mingw32-gcc (GCC) 4.8.5)

  • a description of the issue with the steps needed to reproduce it

NetCDF 4.8.1 fails to create a NetCDF4 files on Windows with HDF5 1.12.1 (binary from mingw).

The issue has been reported here (in the context of julia) by @visr https://github.com/Alexander-Barth/NCDatasets.jl/issues/164

The julia code in the issue correspond to the following C code:

retval = nc_create("test.nc4", NC_NETCDF4, &ncid)

So just creating a NetCDF4 file triggers the issue.

The error message is:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7905cc0c -- nc4_create_file at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:124
in expression starting at C:\Users\visser_mn\.julia\packages\NCDatasets\TCrQh\test\runtests.jl:10
nc4_create_file at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:124
NC4_create at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:313
NC_create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:1926
nc__create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:464
nc_create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:391
nc_create at C:\Users\visser_mn\.julia\packages\NCDatasets\TCrQh\src\netcdf_c.jl:255
[...]

This corresponds to the following line: https://github.com/Unidata/netcdf-c/blob/v4.8.1/libhdf5/hdf5create.c#L124

To build NetCDF4 on Windows I have to apply these patches: https://github.com/Alexander-Barth/Yggdrasil/tree/NetCDF-v4.8.1/N/NetCDF/bundled/patches

This first patch is based on https://github.com/Unidata/netcdf-c/pull/2138.

NetCDF 4.8.1 works on all other tested platforms (Linux, Mac OS, even Mac OS-M1). We had also this issue with NetCDF 4.7.4 and HDF5 1.12.1 on Windows.

In julia, all libraries are cross-compiled from a Linux-x86_64 environment targeting the different OS and CPU architectures. I am not sure where actually the problem is. It could also be in HDF5 , the mingw compiler, ...

Any help would be greatly appreciated :-)

As a Linux user, I am not too familiar with Windows. I just want to get our software to work for our students which are primarily Windows users.

Ref: https://github.com/Alexander-Barth/NCDatasets.jl/issues/164 https://github.com/JuliaPackaging/Yggdrasil/issues/4511 https://github.com/JuliaGeo/NetCDF.jl/issues/151 https://github.com/Unidata/netcdf-c/pull/2138 https://github.com/Unidata/netcdf-c/issues/2124

CC: @visr, @giordano

Alexander-Barth avatar Mar 15 '22 13:03 Alexander-Barth

Thank you for the very comprehensive report! I'll take a look, I am working primarily on Linux/OSX myself, with Windows being a secondary environment. I will try to duplicate this environment and issue, and hopefully being able to provide some insight. Hopefully I will be able to track this down without going into Julia, as I unfortunately have no experience there.

WardF avatar Mar 15 '22 22:03 WardF

Could this be a file permission problem? Do you have permission to create the file, or is there already a file of that name that you are attempting to overwrite without using NC_CLOBBER?

edwardhartnett avatar Mar 16 '22 08:03 edwardhartnett

In our case, we have this issue also with a randomly created filenames in the windows temporary directory (windows equivalent of /tmp) when running our test suite. I guess that @visr also checked that the file test.nc4 did not exist in his tests.

Alexander-Barth avatar Mar 16 '22 09:03 Alexander-Barth

Yes indeed, I don't think it is a file permission issue. The filename didn't exist, and I could create a netcdf3-classic file in the same place with the same library.

visr avatar Mar 16 '22 15:03 visr

Just noting that we still see this on netCDF 4.9.0 built against HDF5 1.12.2, ref https://github.com/Alexander-Barth/NCDatasets.jl/issues/164#issuecomment-1202094798, only when cross compiling netCDF for Windows.

visr avatar Aug 03 '22 09:08 visr

As a test, I added a simple test function to libnetcdf which only creates a HDF5 File Access Properties list:

int my_test_function() {
    hid_t fapl_id = -1;
    int retval = NC_NOERR;
    printf("start\n");
    fapl_id = H5Pcreate(H5P_FILE_ACCESS);
    printf("end\n");
    return retval;
}

Calling this function reproduces this crash:

$ /c/Users/Alexander\ Barth/AppData/Local/Programs/Julia-1.8.0-rc3/bin/julia.exe --eval ' using NetCDF_jll; ccall((:my_test_function, libnetcdf), Cint, ())'

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1758 -- my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
in expression starting at none:1
my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
top-level scope at .\none:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
exec_options at .\client.jl:276
_start at .\client.jl:522
jfptr__start_37025.clone_1 at C:\Users\Alexander Barth\AppData\Local\Programs\Julia-1.8.0-rc3\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 2903 (Pool: 2891; Big: 12); GC: 0
start

Alexander-Barth avatar Aug 05 '22 23:08 Alexander-Barth

Updated issue title to reflect it is still observed in latest version.

WardF avatar Aug 09 '22 15:08 WardF

As a test, I added a simple test function to libnetcdf which only creates a HDF5 File Access Properties list:

int my_test_function() {
    hid_t fapl_id = -1;
    int retval = NC_NOERR;
    printf("start\n");
    fapl_id = H5Pcreate(H5P_FILE_ACCESS);
    printf("end\n");
    return retval;
}

Calling this function reproduces this crash:

$ /c/Users/Alexander\ Barth/AppData/Local/Programs/Julia-1.8.0-rc3/bin/julia.exe --eval ' using NetCDF_jll; ccall((:my_test_function, libnetcdf), Cint, ())'

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1758 -- my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
in expression starting at none:1
my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
top-level scope at .\none:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
exec_options at .\client.jl:276
_start at .\client.jl:522
jfptr__start_37025.clone_1 at C:\Users\Alexander Barth\AppData\Local\Programs\Julia-1.8.0-rc3\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 2903 (Pool: 2891; Big: 12); GC: 0
start

This adds an interesting wrinkle, insofar as this is an HDF5 function being called absent any netCDF function calls. Because it's not completely separated from libnetcdf I can't say with absolute certainty that it's an HDF5 issue, but given the simple nature of the test program and the reliance on pure HDF5 code (the single HDF5 function call), it would suggest to me that this is an issue in libhdf5.

I'm not unfamiliar with cross-compilation, but it is not part of my regular workflow. Let me take a look and see if I can replicate this in a stand-alone test program that only uses hdf5 and is also cross-compiled. I hate to ask, but if this is something you can easily test on your end, @Alexander-Barth, it might be worth doing so so that we can really nail down whether this is in the netCDF layer or not.

WardF avatar Aug 09 '22 15:08 WardF

Indeed, I was wondering the same thing:

https://github.com/JuliaPackaging/Yggdrasil/issues/4511#issuecomment-1207107732

The small example program using just HDF5 also failed when using the gcc compiler from the julia build environment (x86_64-w64-mingw32-gcc 4.8.5). Surprisingly, I can cross-compile the example program using the cross-compiler from Ubuntu 20.04 (with a more up-to-date version "9.3-win32 20200320").

So I am wondering if this could be compiler bug triggered by some changes in HDF5 since version 1.12.1 (completely unrelated to NetCDF).

Alexander-Barth avatar Aug 10 '22 07:08 Alexander-Barth

I am closing this issue since the problem did not show up again after upgrading the GCC version. Thanks to all who contributed to the discussion!

Alexander-Barth avatar Aug 16 '22 20:08 Alexander-Barth