continuous-integration
continuous-integration copied to clipboard
rules_go fails with bazel@HEAD, but not tied to a commit
rules_go fails with bazel@HEAD, but it doesn't seem tied to any particular bazel commit. The failing tests are coverage-related, and seem to be that the coverage file is not found
https://buildkite.com/bazel/culprit-finder/builds/171#863978e5-2c65-4b87-8f9f-3aa1487bc77c
I also tried triggering the rules_go CI with release Bazel, and got partial same failures:
https://buildkite.com/bazel/rules-go-golang/builds/1196
There hasn't been a recent commit in rules_go, so I have to assume this is a CI issue.
See also https://github.com/bazelbuild/rules_go/issues/2100
@c-parsons I think this is caused by the release of Bazel 0.27.0.
We didn't change anything on CI itself in quite a while.
Pardon my ignorance, but how would the release of Bazel 0.27.0 affect testing rules_go against Bazel@HEAD at various commits?
A Bazel release affecting rules_go CI for Bazel@Release would make sense, but I'd expect to be able to bisect the Bazel@HEAD failures and tie them to a Bazel commit...
@c-parsons That's a good question, but rules_go is definitely green with 0.26.1 and red with 0.27.0: https://buildkite.com/bazel/rules-go-golang/builds/1203 (upper "Ubuntu 18.04" is Bazel 0.26.1, lower "Ubuntu 18.04" is with Bazel 0.27.0).
The "known good" commit that culprit finder ran with in your linked build above (a44ea875254c5a630000f1838764e525cdb864ce) is only 7 days old, so maybe it's just not old enough? Maybe the breakage was much earlier?
I'm running another culprit finder here with good commit = 0.27.0 and bad commit = 0.26.1, let's see how that goes: https://buildkite.com/bazel/culprit-finder/builds/172
Culprit finder is failing, because it can't find the Bazel binaries it needs on GCS. :(
I have to leave the office now - maybe you can just bisect this locally on your machine?
So, this is failed not tied to a release because these tests uses some details about the host machine, particularly it will, in test, run /usr/bin/bazel. I bet that when we released Bazel 0.27.0, the CI machines permanently have 0.27.0 at that location, even for bazel@HEAD tests.
I think we might be able to alter which bazel is used in this test runner by setting the BAZEL environment variable...
Still, what frustrating nondeterminism...
cc @jayconrod
Good catch! This explains why the target was failing for me with old commits and old Bazel versions.
I opened bazelbuild/bazel#8670. I believe it's a bug in coverage.
Oh, wow. Yes, due to the Bazel detection logic in rules_go's bazel_tests.bzl, the "Bazel@HEAD" tests never ran these tests with HEAD, instead they always used the latest release. That's why we didn't detect this breakage before 0.27.0 was released.
I have to think about how to prevent this from happening on CI next week.
The test passes locally when I use 0.26.1 and I set the BAZEL variable:
BAZEL=~/.cache/bazelisk/bin/bazel-0.26.1-linux-x86_64 bazelisk test //tests/core/coverage:coverage_test_test
So the breakage is caused by 0.27. But I cannot reproduce it with any incompatible flag, so it's a bug in Bazel.
@laurentlb Did we find the bug in Bazel? Should we rollback the culprit?
@meteorcloudy The issue was bazelbuild/bazel#8670. The fix is cherry-picked in bazel 0.27.1.rc1, and I've verified it fixes rules_go CI (log).
@jayconrod Thanks for the notice!
For the reference, https://github.com/bazelbuild/rules_go/blob/master/tests/bazel_tests.bzl#L331 is the place where rules_go try to find Bazel binary. Bazel CI currently doesn't set BAZEL environment variable so this logic is incorrect when running on Bazel CI.
@philwo @jayconrod do you have ideas how we can make this work? Can Bazel CI set BAZEL env var? Can we communicate Bazel binary in some other way? Should we do it at all?
@hlopko The issue here was a regression in coverage support in 0.27.0. It should be fixed in 0.27.1. I think the tests were just picking up bazel from PATH.
I think the BAZEL environment variable was a vestige from Jenkins. It seems better to just get the target version of bazel from PATH though. Most tools and local tests will pick it up automatically without having to be customized for this environment.