Yggdrasil icon indicating copy to clipboard operation
Yggdrasil copied to clipboard

NetCDF 4.8.1 does not work on windows

Open Alexander-Barth opened this issue 4 years ago • 18 comments

Unfortunately, there is an issue with my PR https://github.com/JuliaPackaging/Yggdrasil/pull/3620. The windows binary does not seem to work. The size of the library is also suspiciously small. Symbols like nc_inq_libvers are no longer found.

Alexander-Barth avatar Oct 12 '21 13:10 Alexander-Barth

It seems that the users gets libnetcdf.dll.a while the NetCDF functions are in libnetcdf-19.dll. I am not sure about the difference between these files.

sandbox:${WORKSPACE}/srcdir/netcdf-c-4.8.1 # nm /workspace/destdir/lib/libnetcdf.dll.a  | grep nc_inq_libver
sandbox:${WORKSPACE}/srcdir/netcdf-c-4.8.1 # nm /workspace/srcdir/netcdf-c-4.8.1/liblib/.libs/libnetcdf-19.dll | grep nc_inq_libver
0000000066f483e0 T nc_inq_libvers

Alexander-Barth avatar Oct 12 '21 13:10 Alexander-Barth

Libraries for Windows are under bin/, not lib/:

sandbox:${WORKSPACE} # nm libnetcdf-19.dll | grep nc_inq_libvers
0000000066f483e0 T nc_inq_libvers

giordano avatar Oct 12 '21 13:10 giordano

You are right. I should look at libnetcdf-19.dll in bin. Still for some reasons I get this error in my package:

https://github.com/Alexander-Barth/NCDatasets.jl/runs/3870249033?check_suite_focus=true#step:7:85

Alexander-Barth avatar Oct 12 '21 13:10 Alexander-Barth

On a windows machine, I can see the function nc_inq_libvers using MSYS2's nm:

Alexander Barth@alex-laptop MSYS /c/Users/Alexander Barth/.julia/artifacts/ae78b073115f5cca9ab13a23994bdd930ecfe887/bin
$ /mingw64/bin/nm.exe libnetcdf-19.dll  | grep nc_inq_libvers
0000000066f483e0 T nc_inq_libvers

But in Julia 1.6.2, this function cannot be found:

julia> using NetCDF_jll

julia> ccall((:nc_inq_libvers,libnetcdf),Cstring,())
ERROR: could not load symbol "nc_inq_libvers":
The specified procedure could not be found.
Stacktrace:
 [1] top-level scope
   @ .\REPL[31]:1

(@v1.6) pkg> st NetCDF_jll
      Status `C:\Users\Alexander Barth\.julia\environments\v1.6\Project.toml`
  [7243133f] NetCDF_jll v400.802.100+0

julia> nclib = Libdl.dlopen(libnetcdf)
Ptr{Nothing} @0x0000000066f40000

julia> dlsym(nclib,:nc_inq_libvers)
ERROR: could not load symbol "nc_inq_libvers":
The specified procedure could not be found.
Stacktrace:
 [1] #dlsym#1
   @ .\libdl.jl:56 [inlined]
 [2] dlsym(hnd::Ptr{Nothing}, s::Symbol)
   @ Base.Libc.Libdl .\libdl.jl:54
 [3] top-level scope
   @ REPL[34]:1

Alexander-Barth avatar Oct 12 '21 13:10 Alexander-Barth

What if you dlopen the library with RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL? See for example https://discourse.julialang.org/t/problem-when-loading-intel-mkl-shared-library-to-use-with-c-written-shared-library/69323/9

giordano avatar Oct 12 '21 13:10 giordano

Unfortunately, this is still the same (from a fresh julia session):

julia> using NetCDF_jll, Libdl

julia> nclib = Libdl.dlopen(libnetcdf,RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL)
Ptr{Nothing} @0x0000000066f40000

julia> dlsym(nclib,:nc_inq_libvers)
ERROR: could not load symbol "nc_inq_libvers":
The specified procedure could not be found.
Stacktrace:
 [1] #dlsym#1
   @ .\libdl.jl:56 [inlined]
 [2] dlsym(hnd::Ptr{Nothing}, s::Symbol)
   @ Base.Libc.Libdl .\libdl.jl:54
 [3] top-level scope
   @ REPL[5]:1

I am trying with the old NetCDF library to see what could be the difference.

Alexander-Barth avatar Oct 12 '21 14:10 Alexander-Barth

If you do that after using NetCDF_jll, the library had been already dlopened, I'm not sure it'd be dlopened a second time

giordano avatar Oct 12 '21 14:10 giordano

Just after restarting julia:

julia> const oldlibnetcdf = "c:\\Users\\Alexander Barth\\.julia\\artifacts\\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\\bin\\libnetcdf-18.dll"
"c:\\Users\\Alexander Barth\\.julia\\artifacts\\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\\bin\\libnetcdf-18.dll"

julia> isfile(oldlibnetcdf)
true

julia> using Libdl

julia> ccall((:nc_inq_libvers,oldlibnetcdf),Cstring,())
ERROR: could not load library "c:\Users\Alexander Barth\.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll"
The specified module could not be found.
Stacktrace:
 [1] top-level scope
   @ .\REPL[4]:1

julia> using NetCDF_jll

julia> ccall((:nc_inq_libvers,oldlibnetcdf),Cstring,())
Cstring(0x0000000068391260)

julia> unsafe_string(ccall((:nc_inq_libvers,oldlibnetcdf),Cstring,()))
"4.7.4 of Jan 26 2021 22:47:01 \$"

Maybe using NetCDF_jll loads also the dependencies which makes the old library usuable?

Alexander-Barth avatar Oct 12 '21 14:10 Alexander-Barth

Maybe using NetCDF_jll loads also the dependencies which makes the old library usuable?

Correct and this is very likely the cause of the error here

giordano avatar Oct 12 '21 14:10 giordano

I am not sure if this is relevant, but the new library is also linked against apphelp.dll (present on my windows machine)

Alexander Barth@alex-laptop MSYS /c/Users/Alexander Barth/.julia/artifacts/a81fe95ac632a7fa76c5e9cbe522c998aee9fa21/bin
$ ldd ../../ae78b073115f5cca9ab13a23994bdd930ecfe887/bin/libnetcdf-19.dll
        ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffb0e970000)
        KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffb0dbd0000)
        KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffb0c220000)
        apphelp.dll => /c/WINDOWS/SYSTEM32/apphelp.dll (0x7ffb09540000)
        libnetcdf-19.dll => /c/Users/Alexander Barth/.julia/artifacts/ae78b073115f5cca9ab13a23994bdd930ecfe887/bin/libnetcdf-19.dll (0x66f40000)
        msvcrt.dll => /c/Windows/System32/msvcrt.dll (0x7ffb0dc90000)

Alexander Barth@alex-laptop MSYS /c/Users/Alexander Barth/.julia/artifacts/a81fe95ac632a7fa76c5e9cbe522c998aee9fa21/bin
$ ldd libnetcdf-18.dll
        ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffb0e970000)
        KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffb0dbd0000)
        KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffb0c220000)
        libnetcdf-18.dll => /c/Users/Alexander Barth/.julia/artifacts/a81fe95ac632a7fa76c5e9cbe522c998aee9fa21/bin/libnetcdf-18.dll (0x682c0000)
        msvcrt.dll => /c/Windows/System32/msvcrt.dll (0x7ffb0dc90000)

Alexander-Barth avatar Oct 12 '21 14:10 Alexander-Barth

Don't trust ldd, it dynamically dlopens a library and shows all libraries found on the current system, not the libraries the binary objects links to:

sandbox:${WORKSPACE} # ${target}-objdump -x libnetcdf-19.dll | grep "DLL Name"
        DLL Name: KERNEL32.dll
        DLL Name: msvcrt.dll
        DLL Name: libcurl-4.dll
        DLL Name: libhdf5_hl-0.dll
        DLL Name: libhdf5-0.dll

giordano avatar Oct 12 '21 14:10 giordano

OK, good to know. I am trying to compile a simple C program using NetCDF

First with NetCDF 4.7.4: this works

Alexander Barth@alex-laptop MINGW64 /c/Users/Alexander Barth/Downloads
$ cp /c/Users/Alexander\ Barth/.julia/artifacts/a81fe95ac632a7fa76c5e9cbe522c998aee9fa21/bin/libnetcdf-18.dll .

Alexander Barth@alex-laptop MINGW64 /c/Users/Alexander Barth/Downloads
$ gcc -I/c/Users/Alexander\ Barth/.julia/artifacts/a81fe95ac632a7fa76c5e9cbe522c998aee9fa21/include/ simple_xy_wr.c -L/c/Users/Alexander\ Barth/.julia/artifacts/a81fe95ac632a7fa76c5e9cbe522c998aee9fa21/lib -lnetcdf^C

Alexander Barth@alex-laptop MINGW64 /c/Users/Alexander Barth/Downloads
$ ./a.exe
*** SUCCESS writing example file simple_xy.nc!

Then with NetCDF 4.8.1 whichs fails. The compiler is unable to find any functions. Maybe they are not exported?

Alexander Barth@alex-laptop MINGW64 /c/Users/Alexander Barth/Downloads
$ cp /c/Users/Alexander\ Barth/.julia/artifacts/ae78b073115f5cca9ab13a23994bdd930ecfe887/bin/libnetcdf-19.dll .

Alexander Barth@alex-laptop MINGW64 /c/Users/Alexander Barth/Downloads
$ gcc -o simple_xy_wr_nc481 -I/c/Users/Alexander\ Barth/.julia/artifacts/ae78b073115f5cca9ab13a23994bdd930ecfe887/include/ simple_xy_wr.c -L/c/Users/Alexander\ Barth/.julia/artifacts/ae78b073115f5cca9ab13a23994bdd930ecfe887/lib -lnetcdf
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0xa4): undefined reference to `nc_create'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0xc0): undefined reference to `nc_strerror'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0xfd): undefined reference to `nc_def_dim'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x119): undefined reference to `nc_strerror'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x156): undefined reference to `nc_def_dim'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x172): undefined reference to `nc_strerror'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x1db): undefined reference to `nc_def_var'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x1f7): undefined reference to `nc_strerror'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x21d): undefined reference to `nc_enddef'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x239): undefined reference to `nc_strerror'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x26c): undefined reference to `nc_put_var_int'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x288): undefined reference to `nc_strerror'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x2ae): undefined reference to `nc_close'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\msys64\tmp\ccI2WaKl.o:simple_xy_wr.c:(.text+0x2ca): undefined reference to `nc_strerror'
collect2.exe: error: ld returned 1 exit status

Alexander-Barth avatar Oct 12 '21 14:10 Alexander-Barth

Using Dependency Walker on Windows or with objdump -p libnetcdf-19.dll on Linux, it seems that the Export Address Table is much longer in the old version 4.7.4 than in the new version 4.8.1 (13 symbols versus 1771 symbols).

Alexander-Barth avatar Oct 12 '21 15:10 Alexander-Barth

I found a way to force the export of all symbols. But I am facing now a different issue on Windows:

https://github.com/Unidata/netcdf-c/issues/2124

Would it be possible to rollback the upgrade to its previous version? Sorry for all this confusion!

Alexander-Barth avatar Oct 12 '21 20:10 Alexander-Barth

I found a way to force the export of all symbols

Did you also try to do a native build with the same build options? Or is there any native build for Windows already available? I don't think we do anything special with regard to symbols being exported or not, so I'm curious to see what happens outside of BinaryBuilder.

Would it be possible to rollback the upgrade to its previous version? Sorry for all this confusion!

You can yank the offending versions in the registry.

giordano avatar Oct 12 '21 20:10 giordano

So far, I did not try to make a native build and I am not aware of any native build available. Unfortunately, mingw64 is not officially supported by NetCDF (only the Microsoft C compiler on Windows) and the library contains a lot of platform specific code.

For your information, I needed the use the -Wl,--export-all-symbols flag:

diff --git a/N/NetCDF/common.jl b/N/NetCDF/common.jl
index 3c04a6c1..8176f14f 100644
--- a/N/NetCDF/common.jl
+++ b/N/NetCDF/common.jl
@@ -34,7 +34,7 @@ if [[ ${target} == *-mingw* ]]; then
     export LIBS="-lhdf5-0 -lhdf5_hl-0 -lcurl-4 -lz"
     # linking fails with: "libtool:   error: can't build x86_64-w64-mingw32 shared library unless -no-undefined is specified"
     # unless -no-undefined is added to LDFLAGS
-    LDFLAGS_MAKE="${LDFLAGS} ${LIBS} -no-undefined"
+    LDFLAGS_MAKE="${LDFLAGS} ${LIBS} -no-undefined -Wl,--export-all-symbols"
 
     # testset fails on mingw (NetCDF 4.8.1)
     # libtool: link: cc -fno-strict-aliasing -o .libs/pathcvt.exe pathcvt.o  -L/workspace/destdir/bin ../liblib/.libs/libnetcdf.dll.a -lhdf5-0 -lhdf5_hl-0 -lcurl-4 -lz -L/workspace/destdir/lib

I found this concerning the --export-all-symbols flag: https://sourceware.org/binutils/docs-2.37/ld.html#WIN32 I hope to find some more time to debug this issue and make a native build on Windows.

Thank you for the suggestion to yank the version in the registry. This is done now.

Alexander-Barth avatar Oct 13 '21 07:10 Alexander-Barth

I believe I'm crashing into this issue in #4344. Honestly, I have zero clue about what's wrong with the Windows build, but I confirm it looks "broken"

giordano avatar Jan 31 '22 00:01 giordano

Here is a fresh attempt: https://github.com/Alexander-Barth/Yggdrasil/tree/NetCDF-v4.8.1/N/NetCDF

I tried to find some volunteers to test the library before I make a PR: https://github.com/Alexander-Barth/NCDatasets.jl/issues/165

Alexander-Barth avatar Mar 02 '22 16:03 Alexander-Barth