luv icon indicating copy to clipboard operation
luv copied to clipboard

Try to reproduce opam-repository CI failure on Windows

Open aantron opened this issue 1 year ago • 13 comments

During https://github.com/ocaml/opam-repository/pull/26501, in https://github.com/ocaml/opam-repository/actions/runs/10723880870/job/29738202075?pr=26501:

#=== ERROR while compiling luv.0.5.14 =========================================#
  # context     2.2.1 | win32/x86_64 | ocaml.5.2.0 | file://D:/a/opam-repository/opam-repository
  # path        D:\opamroot\default\.opam-switch\build\luv.0.5.14
  # command     D:\opamroot\default\bin\dune.exe build -p luv -j 3
  # exit-code   1
  # env-file    D:\opamroot\log\luv-1908-3d2dda.env
  # output-file D:\opamroot\log\luv-1908-3d2dda.out
  ### output ###
  # File "_build/.dune/default/src/c/dune", lines 47-66, characters 0-815:
  # 47 | (rule
  # 48 |  (targets libuv.a dlluv%{ext_dll})
  # 49 |  (deps (source_tree vendor))
  # ....
  # [64](https://github.com/ocaml/opam-repository/actions/runs/10723880870/job/29738202075?pr=26501#step:12:65) |    "cp vendor/libuv/libuv.so.1.0.0 dlluv.so || \
  # [65](https://github.com/ocaml/opam-repository/actions/runs/10723880870/job/29738202075?pr=26501#step:12:66) |     cp vendor/libuv/libuv.1.dylib dlluv.so || \
  # [66](https://github.com/ocaml/opam-repository/actions/runs/10723880870/job/29738202075?pr=26501#step:12:67) |     cp vendor/bin/libuv-1.dll dlluv.dll")))))
  # (cd _build/default/src/c/vendor/libuv && D:\opamroot\.cygwin\root\bin\bash.exe -e -u -o pipefail -c "sh libtool --silent --no-warnings --mode install cp libuv.la `pwd`") &> D:\opamroot\default\.opam-switch\build\luv.0.5.14\nul
  # Command exited with code 1.

The build seems to work in Luv's own CI, but maybe it's significant that this is on 5.2.0, or there is another issue.

aantron avatar Sep 05 '24 17:09 aantron

Hm, my PR #160 fixed a problem in this same environment (opam 2.2 on windows without the old mingw repo). I seem to remember seeing no build errors after I applied that patch, and I've used it as a dependency of another package without issues (or without any obvious ones). However, now when I try to run make in the repository it fails with the same error as in the CI...

tobil4sk avatar Sep 06 '24 07:09 tobil4sk

Thanks! This is, however, a different failure from what you fixed in #160, right?

Could you perhaps look into what is causing it? Usually this means that a library archive or .dll file was not generated for those cp commands to copy. Could you list the build directories (in the example above, reported as D:\opamroot\default\.opam-switch\build\luv.0.5.14 and cd _build/default/src/c/vendor/libuv) and see what is in there? If you can reproduce success again with a previous setup or commit of Luv, can you compare?

aantron avatar Sep 06 '24 08:09 aantron

I wonder if the sunset repo and the default opam repo somehow cause different variables for ext_dll or otherwise. If you can spot anything like that in the detailed logs, that would be very helpful. At the top of the dune file for the vendored build there are instructions for how to make it fully verbose:

https://github.com/aantron/luv/blob/d92c8b7dd5a3e254241e065e70592666b6b1d8c0/src/c/dune#L3-L9

Could you try those and see if you spot anything, like different extensions, different files being generated, final linking steps not being run?

aantron avatar Sep 06 '24 08:09 aantron

It's also possible, maybe even likely, that the sunset repo and the standard opam repo cause libuv to self-configure in a different way during its configure script, eventually leading to not generating a shared library output, or similar.

aantron avatar Sep 06 '24 08:09 aantron

Thanks! This is, however, a different failure from what you fixed in https://github.com/aantron/luv/pull/160, right?

Yes, this is a separate issue.

Could you perhaps look into what is causing it?

Yes, I'll have a look this evening.

If you can reproduce success again with a previous setup or commit of Luv, can you compare?

It was built previously with the same (or equivalent) commit, so I'm thinking that maybe I just missed the error previously and assumed that it installed properly.

tobil4sk avatar Sep 06 '24 08:09 tobil4sk

cd _build/default/src/c/vendor/libuv

I seem to remember that when I checked this when I built last time, it was completely empty or didn't exist. I will double check to confirm.

I wonder if the sunset repo and the default opam repo somehow cause different variables for ext_dll or otherwise

This seems plausible. I will follow the steps for the verbose build and report back, hopefully I'll find something. The new setup with setup-ocaml v3 is using opam 2.2 which now supports windows natively, so something might be different.

tobil4sk avatar Sep 06 '24 09:09 tobil4sk

Hm, this also seems the same as #132? But was reported before opam 2.2.

tobil4sk avatar Sep 06 '24 09:09 tobil4sk

Ah, very likely! I was unable to reproduce it at the time in my environment, and it didn't appear in CI.

Thank you!

aantron avatar Sep 06 '24 09:09 aantron

With the verbose compilation output, I got a lot of compiler errors. I did some research and it lead me to this comment in a libuv issue: https://github.com/libuv/libuv/issues/1421#issuecomment-315405453

My build was failing because in my PATH I had C:\msys64\usr\bin;C:\msys64\mingw64\bin;... so the msys2 tooling (in usr) took priority over mingw64, but only mingw64 contains all the required headers for a proper windows build. When I swap these back around (C:\msys64\mingw64\bin;C:\msys64\usr\bin;), the build runs successfully.

This would indicate that there is a problem with the PATH setup in the opam repository ci, rather than anything that has to be fixed here.

tobil4sk avatar Sep 06 '24 22:09 tobil4sk

Hm, that issue I had was in an msys2 environment. When I set up opam with cygwin instead, I get this output from configure:

-ar... nofor x86_64-w64-mingw32
-lib... noor x86_64-w64-mingw32
-link... nor x86_64-w64-mingw32
checking for ar... no
checking for lib... no
checking for link... link -lib
checking the archiver (link -lib) interface... unknown
configure: error: could not determine link -lib interface

tobil4sk avatar Sep 07 '24 09:09 tobil4sk

For cygwin, this entry was missing from my PATH (and from PATH in setup-ocaml), which is where ar is located:

/usr/x86_64-w64-mingw32/bin

The opam-repository script doesn't even add cygwin's /usr/bin, so it isn't even using cygwin's make or bash executables, it is using them from other pre-installed software on github actions.

tobil4sk avatar Sep 07 '24 10:09 tobil4sk

luv's build workflow succeeds with this patch to setup-ocaml: https://github.com/ocaml/setup-ocaml/pull/859.

To reiterate, for a successful build it is necessary to have these values in PATH:

  • For msys2: $MSYS2_ROOT/mingw64/bin and $MSYS2_ROOT/usr/bin (the mingw64 directory must take priority)
  • For cygwin: $CYGWIN_ROOT/usr/x86_64-w64-mingw32/bin and $CYGWIN_ROOT/usr/bin

tobil4sk avatar Sep 07 '24 11:09 tobil4sk

Amazing, thank you! I've followed all those issues, let's see what the upstream maintainers say.

aantron avatar Sep 07 '24 19:09 aantron