bazel icon indicating copy to clipboard operation
bazel copied to clipboard

[8.0.0rc1] Builds hang on Windows with --experimental_collect_worker_data_in_profiler.

Open criemen opened this issue 1 year ago • 5 comments

Description of the bug:

When upgrading to bazel 8 (from a pre-release of bazel 7.4.0), we're observing hangs of bazel when building our codebase on Windows. The hangs happen both on CI and locally, but don't seem to be 100% reproducible.

I've attached a bazel profile, compact execution log, and jstack traces of the two relevant (I believe) java processes for the build. Let me know if I can support you with more debug information.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I've not been able to reproduce this on our public codebase, and will investigate further reductions only if the current debug information isn't sufficient.

Which operating system are you running Bazel on?

Windows 11

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

hang-debugging.zip

criemen avatar Oct 11 '24 09:10 criemen

Stack traces and the profile to me look indistinguishable from a build that just waits for long-running actions to finish (but of course they are taking very long).

If possible, could you try to bisect this down to a particular rolling release or even commit? Bazelisk accepts individual Bazel commits.

fmeum avatar Oct 11 '24 10:10 fmeum

@bazel-io flag

fmeum avatar Oct 11 '24 10:10 fmeum

Okay we're getting somewhere: disabling --experimental_collect_worker_data_in_profiler stops the hangs from occurring (so this might not be a release blocker after all). We also had this enabled on 7.3/7.4, but it might be that the option is just silently ignored on those branches?

I got hangs back to (at least) 8.0.0-pre20240516.1, then in my manual bisecting I switched to an older version that didn't have the flag,

Enabling that flag by default was reverted, due to flakiness in the multiplex_worker tests in https://github.com/bazelbuild/bazel/commit/a9525c701125664bb9daf5637084e85dff186d31

Unfortunately, there's no PRs or external history associated with this flag.

criemen avatar Oct 11 '24 11:10 criemen

It didn't do anything on Windows before the revert: https://github.com/bazelbuild/bazel/commit/a9525c701125664bb9daf5637084e85dff186d31#diff-b572d41bff84fa61b397e97467a898b32baf118421a0b06859e3fa04c556a7ebL219

I don't know how it works, but maybe this if should be brought back?

fmeum avatar Oct 11 '24 12:10 fmeum

@bazel-io fork 8.0.0

iancha1992 avatar Oct 11 '24 16:10 iancha1992

@zhengwei143 @bigelephant29 just checking in, what's the status of this? Do we know if it's a regression in 8.0.0rc1/2?

Wyverald avatar Nov 15 '24 19:11 Wyverald

Do we know if it's a regression in 8.0.0rc1/2?

Yes it is. @bigelephant29 I think we should fix this issue separately from enabling --experimental_collect_worker_data_in_profiler by default. @fmeum has identified the issue in https://github.com/bazelbuild/bazel/issues/23952#issuecomment-2407308701.

zhengwei143 avatar Nov 18 '24 03:11 zhengwei143

We probably also want to cherry-pick https://github.com/bazelbuild/bazel/commit/b2664b1e4d60b5de8536242721f09841b30e6610 then

meisterT avatar Nov 19 '24 07:11 meisterT

A fix for this issue has been included in Bazel 8.0.0 RC3. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.0.0rc3. Thanks!

iancha1992 avatar Nov 22 '24 22:11 iancha1992