[8.0.0rc1] Builds hang on Windows with --experimental_collect_worker_data_in_profiler.
Description of the bug:
When upgrading to bazel 8 (from a pre-release of bazel 7.4.0), we're observing hangs of bazel when building our codebase on Windows. The hangs happen both on CI and locally, but don't seem to be 100% reproducible.
I've attached a bazel profile, compact execution log, and jstack traces of the two relevant (I believe) java processes for the build. Let me know if I can support you with more debug information.
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I've not been able to reproduce this on our public codebase, and will investigate further reductions only if the current debug information isn't sufficient.
Which operating system are you running Bazel on?
Windows 11
What is the output of bazel info release?
No response
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
No response
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
Stack traces and the profile to me look indistinguishable from a build that just waits for long-running actions to finish (but of course they are taking very long).
If possible, could you try to bisect this down to a particular rolling release or even commit? Bazelisk accepts individual Bazel commits.
@bazel-io flag
Okay we're getting somewhere: disabling --experimental_collect_worker_data_in_profiler stops the hangs from occurring (so this might not be a release blocker after all). We also had this enabled on 7.3/7.4, but it might be that the option is just silently ignored on those branches?
I got hangs back to (at least) 8.0.0-pre20240516.1, then in my manual bisecting I switched to an older version that didn't have the flag,
Enabling that flag by default was reverted, due to flakiness in the multiplex_worker tests in https://github.com/bazelbuild/bazel/commit/a9525c701125664bb9daf5637084e85dff186d31
Unfortunately, there's no PRs or external history associated with this flag.
It didn't do anything on Windows before the revert: https://github.com/bazelbuild/bazel/commit/a9525c701125664bb9daf5637084e85dff186d31#diff-b572d41bff84fa61b397e97467a898b32baf118421a0b06859e3fa04c556a7ebL219
I don't know how it works, but maybe this if should be brought back?
@bazel-io fork 8.0.0
@zhengwei143 @bigelephant29 just checking in, what's the status of this? Do we know if it's a regression in 8.0.0rc1/2?
Do we know if it's a regression in 8.0.0rc1/2?
Yes it is. @bigelephant29 I think we should fix this issue separately from enabling --experimental_collect_worker_data_in_profiler by default. @fmeum has identified the issue in https://github.com/bazelbuild/bazel/issues/23952#issuecomment-2407308701.
We probably also want to cherry-pick https://github.com/bazelbuild/bazel/commit/b2664b1e4d60b5de8536242721f09841b30e6610 then
A fix for this issue has been included in Bazel 8.0.0 RC3. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.0.0rc3. Thanks!