continuous-integration icon indicating copy to clipboard operation
continuous-integration copied to clipboard

Unexpected failures on windows runners

Open UebelAndre opened this issue 4 years ago • 16 comments

Initially noticed on https://github.com/bazelbuild/rules_rust/pull/879#issuecomment-894896993

Did something change about the windows runners recently? As you can see in that comment, build 4029 succeeded but build 4046 failed when no changes had been made to that branch.

UebelAndre avatar Aug 09 '21 14:08 UebelAndre

cc @hlopko for visibility

UebelAndre avatar Aug 09 '21 14:08 UebelAndre

Hi,

yes, I did an upgrade of our VMs and containers over the weekend - for Windows I summarize the changes here: https://github.com/bazelbuild/bazel/issues/13816

We‘ve seen a couple of issues with the new image and are currently fixing broken tests etc. across CI.

philwo avatar Aug 09 '21 14:08 philwo

That failure is strange. We didn’t change anything about Visual Studio except the version of the Windows SDK in the new images. :/

philwo avatar Aug 09 '21 14:08 philwo

Oh… maybe the failing test is accidentally using a „link“ executable from MSYS while you’re expecting a link.exe from VStudio?

philwo avatar Aug 09 '21 14:08 philwo

@meteorcloudy Could we remove the mingw32 compiler, gcc, etc. from MSYS2? Why do we need / install them when we want to build with Visual Studio on Windows? (Not sure that that’s where the /usr/bin/link is coming from, but it’s my guess)

philwo avatar Aug 09 '21 14:08 philwo

Oh no, link.exe actually comes from https://packages.msys2.org/package/coreutils?repo=msys&variant=x86_64 … :/

philwo avatar Aug 09 '21 14:08 philwo

Yeah, the /usr/bin/link should exist even before the upgrade of VM, so it must be something else that caused the build to find the unix link instead of the one from VC++ build tool.

meteorcloudy avatar Aug 09 '21 14:08 meteorcloudy

This is how rust finds the link.exe binary: https://github.com/rust-lang/rust/blob/eaf6f463599df1f18da94a6965e216ea15795417/compiler/rustc_codegen_ssa/src/back/link.rs#L851

meteorcloudy avatar Aug 09 '21 15:08 meteorcloudy

I'm not very familiar with rust, cannot figure out what went wrong, @UebelAndre can you help take a look?

meteorcloudy avatar Aug 09 '21 15:08 meteorcloudy

I may have some time toward the end of the day today to take a closer look but otherwise don't have too much free time this week so can't commit to too much support 😞

UebelAndre avatar Aug 09 '21 15:08 UebelAndre

Basically this API doesn't work correctly: https://docs.rs/cc/1.0.29/cc/windows_registry/fn.find_tool.html

I wonder if the registry is somehow messed up?

meteorcloudy avatar Aug 09 '21 15:08 meteorcloudy

rules_rust take the linker from the C++ toolchain, we don't use the rustc defaults. This is where this happens: https://github.com/bazelbuild/rules_rust/blob/main/rust/private/rustc.bzl#L229.

I'm hopeless when it comes to Windows, but to me it seems the error is not directly Rust related. @meteorcloudy, does the invocation look reasonable to you?

"link.exe" "/NOLOGO" "C:\\temp\\rustdoctest1qTZce\\rust_out.rust_out.7rcbfp3g-cgu.0.rcgu.o" "C:\\temp\\rustdoctest1qTZce\\rust_out.33dyzt1ekirinwy8.rcgu.o" "/LIBPATH:C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib" "C:\\b\\yzf2u4jt\\execroot\\rules_rust\\bazel-out\\x64_windows-fastbuild\\bin\\external\\examples\\fibonacci\\libfibonacci--608111072.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libstd-3d786a338e3fbd3c.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libpanic_unwind-c7722f94ca812e0f.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libstd_detect-f6ac1aae8e3d5b95.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\librustc_demangle-8244d5c29082f380.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libhashbrown-c29ed8b388a545d6.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\librustc_std_workspace_alloc-daec0207219073db.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libunwind-e1164c8529217a2a.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libcfg_if-78991d36592dccee.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\liblibc-3e2bb97c5be118b7.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\liballoc-d5bd6400adb9fa95.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\librustc_std_workspace_core-07dcecfd1f459221.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libcore-f0c150dc0abba70a.rlib" "C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib\\libcompiler_builtins-0f3806ca1d72c7be.rlib" "kernel32.lib" "ws2_32.lib" "advapi32.lib" "userenv.lib" "kernel32.lib" "msvcrt.lib" "/NXCOMPAT" "/LIBPATH:C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\x86_64-pc-windows-msvc\\lib" "/OUT:C:\\temp\\rustdoctest1qTZce\\rust_out" "/OPT:REF,NOICF" "/DEBUG" "/NATVIS:C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\etc\\intrinsic.natvis" "/NATVIS:C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\etc\\liballoc.natvis" "/NATVIS:C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\etc\\libcore.natvis" "/NATVIS:C:\\b\\yzf2u4jt\\external\\rust_windows_x86_64\\lib\\rustlib\\etc\\libstd.natvis"

We also take linker flags from the C++ toolchain, but we construct flags for libraries to link ourselves here: https://github.com/bazelbuild/rules_rust/blob/main/rust/private/rustc.bzl#L879. Is that logic correct?

How would the error look like if there were undefined symbols or cyclic dependencies between libraries? Can it be that we're hitting that problem? Can this be related to https://github.com/bazelbuild/rules_rust/issues/637? One way to debug this further would be to replace rust_binary with a cc_binary and:

  • don't use rust_binary/rust_test for the test, use rust_library
  • add a cc_binary that depends on the rust_library
  • set crate_root in the rust_library to point to the main.rs
  • in main.rs, replace fn main() {...} with
#[no_mangle]
extern "C" fn main() {...}

hlopko avatar Aug 10 '21 07:08 hlopko

If ^ works when C++ constructs the linking action, we can diff the command line and see further where things went wrong.

hlopko avatar Aug 10 '21 07:08 hlopko

Interesting, thanks Marcel!

We can see in the logs that the "outer" Bazel can successfully build Rust files (at least there are a few messages that suggest that, like "(01:17:50) INFO: From Compiling Rust rlib libc (59 files): [...]" and "Compiling Rust bin cargo_build_script_runner (1 files); 1s local, remote-cache".

But then the "inner" Bazel in the (shell) tests fails when building Rust.

My guess what happens is that inside the shell tests, the PATH is wrong and puts MSYS2's /usr/bin before the other directories and as the command-line does not call link.exe using an absolute path, it uses MSYS2's binary instead of the one provided by Visual Studio.

philwo avatar Aug 10 '21 08:08 philwo

Oh wow I've never looked into the implementation of rust_doc_test, I didn't know about the shell script inside. Still, there doesn't seem to be outer/inner Bazel, just potentially slightly different compilation actions.

@UebelAndre can you maybe take a look if regular rustc actions and rust_doc compilation actions differ in their env or in paths used? That could be an explanation.

hlopko avatar Aug 11 '21 11:08 hlopko

The difference is that rust_doc is attempting to run a compile action as a test. rustdoc will compile the doc tests and then run them. Unfortunately, there's no stable way for rustdoc to build tests and run them later so I think the question becomes, what can be done to recreate the same environment used for Rustc actions in this test target's execution? In attempting this, I ran into a myriad of issues which might be solvable but I don't currently know how or even if I've encountered all the issues needed to get that working.

UebelAndre avatar Aug 11 '21 16:08 UebelAndre