NCDatasets.jl icon indicating copy to clipboard operation
NCDatasets.jl copied to clipboard

NetCDF_jll v400.702.402+0 and later broken on Windows

Open visr opened this issue 3 years ago • 16 comments

Describe the bug

A new NetCDF_jll release was made a few days ago. It appears that this doesn't communicate well with the HDF5 dependency.

This was introduced in https://github.com/JuliaPackaging/Yggdrasil/pull/4481, cc @felixcremer. I don't see anything wrong in the build file, but the main difference is that it is linked to HDF5_jll v1.12.1+0 instead of the older v1.12.0+1.

Shall we yank the build? We would lose a functioning Apple M1 build however.

Side note: this is not directly related to NCDatasets. @Alexander-Barth do you prefer that I create these issues in Yggdrasil? I thought here might be better for visibility for users.

To Reproduce

Here is an example using only NetCDF_jll, to keep it as simple as possible.

julia> using NetCDF_jll
julia> unsafe_string(ccall((:nc_inq_libvers, libnetcdf), Cstring, ()))
"4.7.4 of Feb 22 2022 14:00:01 \$"
julia> NC_CLASSIC_MODEL = 0x0100
julia> NC_NETCDF4 = 0x1000
julia> # it can create a classic netcdf (no HDF5 needed)
julia> ccall((:nc_create, libnetcdf), Cint, (Cstring, Cint, Ptr{Cint}), "test.nc3", NC_CLASSIC_MODEL, Ref(Cint(0)))
julia> # but gives a segfault on netcdf4
julia> ccall((:nc_create, libnetcdf), Cint, (Cstring, Cint, Ptr{Cint}), "test.nc4", NC_NETCDF4, Ref(Cint(0)))

Expected behavior

Create an empty ""test.nc4" file.

Environment

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 3

Full output

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x2374cb3 -- .text at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
in expression starting at REPL[13]:1
.text at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
NC4_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
NC_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
nc__create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
nc_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
top-level scope at .\REPL[13]:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:876
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:894 [inlined]
jl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:944
eval at .\boot.jl:373 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:150
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:246
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:231
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:364
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:351
#930 at .\client.jl:394
jfptr_YY.930_36349.clone_1 at C:\Users\visser_mn\.julia\juliaup\julia-1.7.2+0~x64\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
#invokelatest#2 at .\essentials.jl:716 [inlined]
invokelatest at .\essentials.jl:714 [inlined]
run_main_repl at .\client.jl:379
exec_options at .\client.jl:309
_start at .\client.jl:495
jfptr__start_21275.clone_1 at C:\Users\visser_mn\.julia\juliaup\julia-1.7.2+0~x64\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:559
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:701
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:42
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 9681000 (Pool: 9675485; Big: 5515); GC: 13

Workaround

add [email protected]

or

[compat]
NetCDF_jll = "=400.702.400"

visr avatar Feb 25 '22 17:02 visr

Thanks a lot for letting me know! Can somebody on Windows test if HDF5 with HDF5_jll v1.12.1+0 can run its test suite without failure?

Alexander-Barth avatar Feb 26 '22 06:02 Alexander-Barth

Yes, I just checked, HDF5.jl, doesn't have an issue with that HDF5_jll, so it seems both libs work individually, but not together, on Windows.

visr avatar Feb 26 '22 12:02 visr

Thanks @visr for the update! Filled also a bug report here: https://github.com/JuliaPackaging/Yggdrasil/issues/4511

Alexander-Barth avatar Feb 27 '22 21:02 Alexander-Barth

Shall we yank the build? We would lose a functioning Apple M1 build however.

As there are many more Windows users than Apple M1 users, I would indeed be in favour to yank the build of NetCDF_jll.

Alexander-Barth avatar Mar 06 '22 17:03 Alexander-Barth

This issue is not resolved but mitigated by declaring NCDatasets incompatible with v400.702.402+0 in NCDatasets 0.12.

Alexander-Barth avatar Mar 11 '22 08:03 Alexander-Barth

Thanks! Just curious, why did you go this route versus yanking the build from the registry? Though I suppose this way Apple M1 users still have a way to use the build if they want.

visr avatar Mar 11 '22 08:03 visr

Though I suppose this way Apple M1 users still have a way to use the build if they want.

Yes this is one reason, and I had a classroom full of students last Monday (mostly Windows, some Apple x86_64, some Linux) and needed a quick solution :-) But I consider this just a temporary fix. Having a installable but dysfunctional NetCDF_jll (for Windows) is still bad in my opinion.

Alexander-Barth avatar Mar 11 '22 08:03 Alexander-Barth

For Julia 1.8 all jll packages (with a shared dependency with julia) need to be rebuild. The is also the case of NetCDF_jll as NetCDF_jll and julia depend on libcurl. Unfortunately the work-around of the setting the compat entry does not work with julia 1.8 on Windows. Unfortunately, there is no installable NetCDF_jll currently on Windows with julia 1.8 (Linux and Mac OS seem to be fine on julia 1.8)

If a windows user want to contribute here is an overview of the involved steps:

  1. Install BinaryBuilder
  2. Clone https://github.com/JuliaPackaging/Yggdrasil
  3. adapt the file N/NetCDF/common.jl and N/NetCDF/[email protected]/build_tarballs.jl and possibly also the corresponding files for HDF5 (dependency of NetCDF)
  4. get a GITHUB_TOKEN with write permission
  5. build the tarball with julia --color=yes ./build_tarballs.jl x86_64-w64-mingw32 --deploy="<your-github-username>/NetCDF_jll.jl" --verbose
  6. your can install your jll with Pkg.add(url="https://github.com/<your-github-username>/NetCDF_jll.jl")
  7. share your solution with a PR to Yggdrasil

Here is more information on BinaryBuilder: https://docs.binarybuilder.org/stable/. Windows support of BinarBuilder is currently under active development. Alternatively, one can also use a Linux VM for BinaryBuilder.

The hart nut to crack is this:

  • https://github.com/Unidata/netcdf-c/issues/2248
  • https://github.com/JuliaPackaging/Yggdrasil/issues/4511

A workaround might be to install NetCDF via Conda.jl or build locally NetCDF_jll with HDF 1.12.0 (but I did not test these options and it is likely that one need to locally adapt NCDatasets compat entry of NetCDF_jll in the Project.toml file)

Alexander-Barth avatar Apr 26 '22 07:04 Alexander-Barth

For Julia 1.8 all jll packages (with a shared dependency with julia) need to be rebuild.

I've been on 1.8 beta for a while and never encountered any issues with the Windows build that is currently pinned (NetCDF_jll v400.702.400+0), and the tests pass locally.

Do you know of any code examples that would fail on 1.8 Windows? I assume these would have to use the shared dependencies (curl, MbedTLS, zlib).

visr avatar Apr 26 '22 08:04 visr

The world of dynamic libraries does not cease to surprise me ! I saw some issues on Linux with julia 1.8 that libnetcdf.so could not be loaded (similar to the transition from julia 1.5 to julia 1.6 where lubcurl was also updated) and rebuilding NetCDF_jll was the solution then. On Linux, I can only use NetCDF_jll v400.802.102+0 with julia 1.8 (not available for Windows).

But maybe Windows can handle a library version mismatch better than Linux. Does a opendap URL works for you?

using NCDataset 
ds = NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1")

Alexander-Barth avatar Apr 26 '22 09:04 Alexander-Barth

Ha, fascinating indeed. The opendap URL also just works, including when I load some actual data into memory (ds["time"][:]).

visr avatar Apr 26 '22 10:04 visr

Thank you for confirming, I am hitting another bug with Linux #173.

Alexander-Barth avatar Apr 26 '22 11:04 Alexander-Barth

For future reference, here is how to pin the NetCDF version (for windows users only to run the NCDatasets master version on julia 1.8):

using Pkg
Pkg.add("NetCDF_jll")
Pkg.pin(name="NetCDF_jll", version="400.702.400")

Alexander-Barth avatar Jul 08 '22 07:07 Alexander-Barth

For the record, here is a test I made in June with HDF5 1.12.2 from MSYS2 (Windows):

$ pacman -Q | grep -i HDF5
mingw-w64-x86_64-hdf5 1.12.2-1

I compiled NetCDF C 4.8.1 from source in MSYS2

./configure --disable-testsets  --enable-shared  --disable-static  --disable-dap-remote-tests
make LDFLAGS=" -no-undefined -Wl,--export-all-symbols" 

In Julia, I used these libraries using set_preferences!:

using Preferences, HDF5_jll, NetCDF_jll

set_preferences!(HDF5_jll, "libhdf5_path" => raw"C:\msys64\mingw64\bin\libhdf5-0.dll")
set_preferences!(NetCDF_jll, "libnetcdf_path" => raw"C:\msys64\home\Alexander Barth\netcdf-c\liblib\.libs\libnetcdf-19.dll")

While running the test suite,

using NCDatasets
include(joinpath(dirname(pathof(NCDatasets)),"..","test","runtests.jl"))
NetCDF library: C:\msys64\home\Alexander Barth\netcdf-c\liblib\.libs\libnetcdf-19.dll

I had no failures:

NetCDF version: 4.8.1 of Jun  8 2022 21:44:34 $
Test Summary: | Pass  Total
NCDatasets    |  829    829
Test Summary:  | Pass  Total
NetCDF4 groups |    9      9
Test Summary:          | Pass  Total
Variable-length arrays |   22     22
Test Summary:  | Pass  Total
Compound types |   16     16
Test Summary:      | Pass  Total
Time and calendars |   25     25
Test Summary:       | Pass  Total
Multi-file datasets |   70     70
Test Summary:     | Pass  Total
Deferred datasets |   13     13
Test Summary: | Pass  Total
@select macro |   33     33
Test.DefaultTestSet("@select macro", Any[], 33, false, false)

Surprisingly, when NetCDF 4.9.0 is compiled with BinaryBuilder using the recently released HDF5_jll 1.12.2, I get now (again) these errors in julia 1.8.0 rc3:

NetCDF_jll.libnetcdf = "C:\\Users\\runneradmin\\.julia\\artifacts\\e3b96f6ac2bb213ecbcbce2ca0ac0bb43bf9561d\\bin\\libnetcdf-19.dll"
NetCDF library: C:\Users\runneradmin\.julia\artifacts\e3b96f6ac2bb213ecbcbce2ca0ac0bb43bf9561d\bin\libnetcdf-19.dll
NetCDF version: 4.9.0 of Aug  1 2022 12:57:29 $

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1a1c -- nc4_create_file at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:120
in expression starting at D:\a\NCDatasets.jl\NCDatasets.jl\test\test_simple.jl:11
nc4_create_file at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:120
NC4_create at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:321

Full logs https://github.com/Alexander-Barth/NCDatasets.jl/runs/7612204603?check_suite_focus=true

I somebody whats to try here is the dll of NetCDF 4.8.1 which worked for me: https://dox.ulg.ac.be/index.php/s/DSZy9SNCUJmCRZA

$ sha1sum libnetcdf-19.dll
47efeb7dcc8756d62d4bd832f2f3bbb8e7fd2c09 *libnetcdf-19.dll

Alexander-Barth avatar Aug 02 '22 06:08 Alexander-Barth

Oh that's a bummer! So do I understand correctly that it's not the HDF5 patch version 1 or 2 that is important, but whether or not netcdf is cross compiled? And the last cross compiled netcdf that was successfully built against a HDF5 mingw build, was netcdf 4.7 against HDF5 1.12.0? https://github.com/JuliaPackaging/Yggdrasil/blob/5b9aa3d48766ab2681f6b92e0b7e6116ddfc5e27/N/NetCDF/common.jl

visr avatar Aug 02 '22 08:08 visr

To be honest, I am not sure what exactly triggers this error but it appears indeed to be a cross-compilation issue introduced in hdf5 1.12.1. As far as I know netcdf only test native compliation, and only relatively recently the mingw compiler in CI.

Alexander-Barth avatar Aug 02 '22 17:08 Alexander-Barth

This long standing issue, should be fixed thanks to NetCDF_jll 400.902.5 .

Alexander-Barth avatar Aug 18 '22 20:08 Alexander-Barth

Thanks a lot for a major effort! Hope it gets easier. Looks like this issue can be unpinned as well :)

visr avatar Aug 18 '22 21:08 visr

Thanks Martijn for your help too!

Alexander-Barth avatar Aug 19 '22 13:08 Alexander-Barth