bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Bazel 5.3 fails to run external tests

Open AustinSchuh opened this issue 1 year ago • 9 comments

Description of the bug:

Bazel fails to run C++ tests in external repositories on remote execution.

Running locally passes, even with linux-sandbox. Bazel 5.0 worked. This has broken bazel > 5.0 for us, and is blocking all upgrades.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

bazel test -c opt --config=engflow --config=build_without_the_bytes @aos//aos:condition_test

Run a c++ test for an external repository on remote execution. (I can't give you a remote execution cluster)

Which operating system are you running Bazel on?

Debian Bullseye

What is the output of bazel info release?

release 5.3.0-202207291633+f440f8ec3f

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

#!/bin/bash

# This script builds a Debian package from a Bazel source tree with the
# correct version.
# The only argument is the path to a Bazel source tree.

set -e
set -u

BAZEL_SOURCE="$1"

VERSION="5.3.0-$(date +%Y%m%d%H%M)+$(GIT_DIR="${BAZEL_SOURCE}/.git" git rev-parse --short HEAD)"
OUTPUT="bazel_${VERSION}"

(
cd "${BAZEL_SOURCE}"
bazel build -c opt //src:bazel --embed_label="${VERSION}" --stamp=yes
)

cp "${BAZEL_SOURCE}/bazel-bin/src/bazel" "${OUTPUT}"

echo "Output is at ${OUTPUT}"

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

austin[504] cmbr (release-5.3.0) ~/local/bazel
$ git remote get-url origin; git rev-parse master; git rev-parse HEAD
https://github.com/bazelbuild/bazel
master
fatal: ambiguous argument 'master': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
f440f8ec3f63e5d663e1f9d9614f05a39422102a

Have you found anything relevant by searching the web?

Reverting https://github.com/bazelbuild/bazel/commit/00805727b867d33fd922e63ca82b0d9825ad79fe fixes it. https://github.com/bazelbuild/bazel/issues/12821 suggested that I add linkstatic to every C++ test we want to run, which seems like 1k's of lines of diffs off upstream, since no reasonable upstream would accept that patch.

Any other information, logs, or outputs that you want to share?

bazel test -c opt --config=remote_execution --config=build_without_the_bytes @aos//aos:condition_test
INFO: Invocation ID: d91e1b30-a290-4282-99c6-029e4102ed35
INFO: Analyzed target @aos//aos:condition_test (86 packages loaded, 20904 targets configured).
INFO: Found 1 test target...
FAIL: @aos//aos:condition_test (see /home/austin/.cache/bazel/_bazel_austin/f5b123ff4a0d503fd09f4c4da36644a6/execroot/repo/bazel-out/k8-opt/testlogs/external/aos/aos/condition_test/test.log)
INFO: From Testing @aos//aos:condition_test:
==================== Test output for @aos//aos:condition_test:
/var/lib/worker/work/1/exec/bazel-out/k8-opt/bin/external/aos/aos/condition_test.runfiles/repo/../aos/aos/condition_test: error while loading shared libraries: libexternal_Saos_Saos_Slibcondition.so: cannot open shared object file: No such file or directory
================================================================================
Target @aos//aos:condition_test up-to-date:
  bazel-bin/external/aos/aos/condition_test
INFO: Elapsed time: 166.038s, Critical Path: 153.48s
INFO: 52 processes: 50 remote cache hit, 2 remote.
@aos//aos:condition_test                                                 FAILED in 127.2s
  /home/austin/.cache/bazel/_bazel_austin/f5b123ff4a0d503fd09f4c4da36644a6/execroot/repo/bazel-out/k8-opt/testlogs/external/aos/aos/condition_test/test.log

Executed 1 out of 1 test: 1 fails remotely.
INFO: Build completed, 1 test FAILED, 52 total actions

And to prove I'm not crazy:

bazel test -c opt -k @aos//aos:condition_test
INFO: Build options --experimental_inmemory_dotd_files, --experimental_inmemory_jdeps_files, --extra_execution_platforms, and 2 more have changed, discarding analysis cache.
INFO: Analyzed target @aos//aos:condition_test (0 packages loaded, 20897 targets configured).
INFO: Found 1 test target...
Target @aos//aos:condition_test up-to-date:
  bazel-bin/external/aos/aos/condition_test
INFO: Elapsed time: 12.246s, Critical Path: 10.91s
INFO: 120 processes: 42 internal, 78 linux-sandbox.
INFO: Build completed successfully, 120 total actions
@aos//aos:condition_test                                                 PASSED in 2.3s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 120 total actions

AustinSchuh avatar Jul 29 '22 23:07 AustinSchuh

Does https://github.com/bazelbuild/bazel/pull/14600 fix this issue?

fmeum avatar Jul 30 '22 01:07 fmeum

Aw, I was hopeful. I fixed some merge conflicts (looks like "../".replace() -> Strings.replace("../", ...) and I still get the same failure.

==================== Test output for @aos//aos:condition_test:
/var/lib/worker/work/3/exec/bazel-out/k8-opt/bin/external/aos/aos/condition_test.runfiles/repo/../aos/aos/condition_test: error while loading shared libraries: libexternal_Saos_Saos_Slibcondition.so: cannot open shared object file: No such file or directory

AustinSchuh avatar Jul 30 '22 02:07 AustinSchuh

Could you check whether https://github.com/bazelbuild/bazel/pull/16008 fixes the issue? It includes an integration test that I distilled from your reproducer.

fmeum avatar Jul 30 '22 21:07 fmeum

It does! Great work, thanks for the prompt response and prompt resolution. I really appreciate it.

FYI, there's the same merge conflict with Strings.replace. Not a hard thing to fix though.

AustinSchuh avatar Aug 01 '22 17:08 AustinSchuh

Assigning to @oquenchil since the linked fix is about cc rules.

coeuvre avatar Aug 02 '22 13:08 coeuvre

Aw, @fmeum , looks like this fixed almost all of the tests except one which expects $(location) and RPATH to agree on the path to the shared libraries. https://github.com/bazelbuild/bazel/issues/16108 is the issue since it smells different enough to be a new bug.

AustinSchuh avatar Aug 16 '22 04:08 AustinSchuh

@bazel-io fork 5.3.1

Wyverald avatar Aug 23 '22 17:08 Wyverald

The fix introduced a regression (see https://github.com/bazelbuild/bazel/pull/16008#issuecomment-1224267553), which we will probably need a patch release for.

Wyverald avatar Aug 23 '22 17:08 Wyverald

Seeing a very similar problem when upgrading from 5.2.0 to 5.3.0:

error while loading shared libraries: libexternal_Sopenvkl_Slibispc_Uutil_Uispc.so: cannot open shared object file: No such file or directory

But trying out 5.3.1rc2, it looks like that particular problem is resolved for us.

philsc avatar Sep 16 '22 17:09 philsc

Hello @AustinSchuh, Are you still seeing this issue with Release 5.3.1 ?

sgowroji avatar Oct 10 '22 08:10 sgowroji

5.3.1 works for me. I'm hitting https://github.com/bazelbuild/bazel/issues/16108 but that feels separate.

AustinSchuh avatar Oct 10 '22 18:10 AustinSchuh

Thanks for the update @AustinSchuh!

ShreeM01 avatar Oct 10 '22 18:10 ShreeM01