bazel-gazelle
bazel-gazelle copied to clipboard
Some go_repository gazelle generations are slow, could be pre-computed or cached
A common pathological case is @com_github_docker_docker
which can spend hundreds of seconds generating BUILD files.
Users are stuck waiting for this to run anytime the repository rule is invalidated.
@siggisim had the clever solution of checking in the resulting generated BUILD files rather than computing on-the-fly, using a script that converts to http_archive like https://github.com/buildbuddy-io/buildbuddy/blob/14ac4f6d9e3df578e31c129add910d176a63d075/deps.bzl#L876-L884
On Slack, @sluongng proposes the results are cachable: https://bazelbuild.slack.com/archives/CDBP88Z0D/p1647012317774269
naively speaking, a Gazelle cache is fairly easy to implement: Enable/disable by setting an environment variable pointing to a path Only cache if the go_repository SHA256 is set, use it as the cache key Under
$HOME/.cache/gazelle
, store the patch file under./<sha256>.patch
If cache not found, generate then store in cache
The problem is compounded by the fact that Bazel reports this time as "Fetching", e.g.
Fetching @com_github_docker_docker; fetching 400s
which sends users down a completely wrong path for how to remediate it.
https://github.com/bazelbuild/bazel-gazelle/pull/1191 fixes that by adding progress reporting, thanks @sluongng for the idea.
Under $HOME/.cache/gazelle, store the patch file under ./
.patch
Thinking on this a bit more, it's worth noting that this design is exclusively solving this for Golang and not for other languages / extensions. So perhaps a more appropriate cache path would be $HOME/.cache/gazelle/go/<sha256>.patch
or alternatively $HOME/.cache/gazelle/go/h1:<sha256-base64>.patch
if we are using the sum
value in go.sum file.
So the name of the environment variable flag would probably need to be something Go specific as well.
Other Gazelle extensions / languages may have their own mechanism of generating external dependencies so it's not a good idea to try to cover for that use case. Instead, we can document that external dependencies can be large and file-walking can be an expensive operation. Therefore, extension's author are encouraged to implement their own language BUILD file generation cache for such cases. 🤔
So if I am understanding the proposal correctly, it is to cache a patch file of Gazelle generations for each repository, with sha256
/sum
being the key.
Aren't the majority of changes to a go_repository
rule caused by a version change (and corresponding sha256
/sum
change)? Meaning this cache would only be helpful when a go_repository
rule is invalidated due to other (hopefully less frequent) reasons, such as a Bazel/gazelle upgrade or a change in the repo config (WORKSPACE dependency is added or removed).
How frequently are you seeing go_repository
's rules being re-evaluated outside of sha256
/sum
changes? Maybe there is a bug or something about your repo's configuration that is causing them to be re-evaluated more often than they should?
@blico You raised a good point here.
In most cases today, such BUILD file generation cache would not be useful at all. Thats because Gazelle file walk is relatively fast for most go repositories.
However for cases such as https://github.com/docker/docker, https://github.com/googleapis/google-api-go-client or a small internal project depending on a big internal monorepo, file walk is expensive. So if we can just eliminate the need to walk files for these external dependencies then it would be a lot better.
Here is an example setup: generate these cache file in CI, then for each Github Codespaces / GitPod dev environment, download these cache from CI and pre-seed them. Then using this cache in combination with GO_REPOSITORY_USE_HOST_CACHE
would cut down the time for go_repository to almost instance. This is especially relevant when you have to cater for audience who don't fully trust Bazel cache and have the tendency to bazel clean --expunge && rm -rf ~/.cache/bazel
when run into issues.
Similar to GO_REPOSITORY_USE_HOST_CACHE
, I think this BUILD gen cache is best to leave as optional opt-in feature.
Heh, coincidentally I was just looking at hellish Docker dependencies. I found the best way to deal with this is just to use the vendor folder, which can be achieved by downloading an archive rather than using go mod download
.
go_repository(
name = "com_github_docker_cli",
build_file_proto_mode = "disable_global",
importpath = "github.com/docker/cli",
sha256 = "5aa34f34b2c6f21ade2227edd08b803d972842d1abd5e70d7e4e758f966e4e3c",
strip_prefix = "cli-20.10.13",
urls = ["https://github.com/docker/cli/archive/refs/tags/v20.10.13.tar.gz"],
) # keep
go_repository(
name = "com_github_docker_docker",
build_file_proto_mode = "disable_global",
importpath = "github.com/docker/docker",
sha256 = "bd82205bbb39e835c4e9c691c9d5bb8dea40ace847f80b030c736faba95b0b67",
strip_prefix = "moby-20.10.13",
urls = ["https://github.com/moby/moby/archive/refs/tags/v20.10.13.tar.gz"],
) # keep
I wonder if could also make sense to rename the files vendor.mod
and vendor.sum
to go.mod
and go.sum
respectively? I don't know how well Gazelle would deal with that since it would not be included in update-repos
...
Anyway, I hope the above snippet is helpful for some.
Thinking a bit further on this topic:
An obvious downside of implementing such cache on the starlark level would be that: changes in Gazelle's build config(or repo config) would not be recognized. I.e. updating a workspace directive such as # gazelle:repository_macro
would be really hard to detect.
To get build config to be part of cache key compute, we would need to implement the cache in the Go layer of gazelle, when it's called with gazelle -go_repository_mode
and perhaps, with an additional flag such as -enable_build_file_cache <cache-dir>
.
This would definitely increase the complexity of such implementation significantly, but perhaps the accuracy is desirable as updating a gazelle directive should invalidate the cache.
@uhthomas could you explain a bit more how that helps to avoid using go mod download
? Doesn't gazelle still spend just as long generating the BUILD files? Or is that instead a workaround for the repository invalidation?
I'm actually not too sure how long Gazelle spends generating those build files but I can't imagine it's very long. Here's a benchmark comparing github.com/moby/moby
with go mod download
versus vendor
.
Rules
go_repository(
name = "com_github_docker_docker",
build_file_proto_mode = "disable_global",
importpath = "github.com/docker/docker",
sha256 = "bd82205bbb39e835c4e9c691c9d5bb8dea40ace847f80b030c736faba95b0b67",
strip_prefix = "moby-20.10.13",
urls = ["https://github.com/moby/moby/archive/refs/tags/v20.10.13.tar.gz"],
) # keep
go_repository(
name = "com_github_docker_docker_mod",
build_file_proto_mode = "disable_global",
importpath = "github.com/docker/docker",
replace = "github.com/moby/moby",
sum = "h1:EzlPdbJmYtz9QwFkLNvnEF8LoVFddaBzBsJjBH9eMHc=",
version = "v20.10.13+incompatible",
)
Benchmark
❯ hyperfine -p 'bazel clean --expunge' -w 1 'bazel build @com_github_docker_docker//api' 'bazel build @com_github_docker_docker_mod//api' --export-markdown out.md
Benchmark 1: bazel build @com_github_docker_docker//api
Time (mean ± σ): 79.276 s ± 5.773 s [User: 0.070 s, System: 0.116 s]
Range (min … max): 71.379 s … 84.786 s 10 runs
Benchmark 2: bazel build @com_github_docker_docker_mod//api
Time (mean ± σ): 367.550 s ± 85.166 s [User: 0.079 s, System: 0.585 s]
Range (min … max): 292.416 s … 574.268 s 10 runs
Summary
'bazel build @com_github_docker_docker//api' ran
4.64 ± 1.13 times faster than 'bazel build @com_github_docker_docker_mod//api'
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
bazel build @com_github_docker_docker//api |
79.276 ± 5.773 | 71.379 | 84.786 | 1.00 |
bazel build @com_github_docker_docker_mod//api |
367.550 ± 85.166 | 292.416 | 574.268 | 4.64 ± 1.13 |
The times are inclusive of fetching and building both the Go SDK and Gazelle, so 70s is really not bad for vendor
.
It seems like the Go toolchain just gets really confused when presented with github.com/docker/docker
and spends ages trying to resolve dependencies. The output will typically spam thousands of lines of go: downloading ...
. See https://gist.github.com/uhthomas/21dba34284bc9365178aeb78543d8702.
so 70s is really not bad for vendor
I think this is subjective: (1) how significant is 70s and (2) actual value can varies depends on File System performance
For (1), I would like to note that in my case where I try to optimize P50 of a CI pipeline to stay below 5 minutes, a constant cost of 70s could be quite significant (23% increase). I can imagine how other folks trying to optimize for P90 could look at 70s increment with 🙄 .
For (2) here is some additional stat:
I used
http_archive(
name = "com_github_docker_docker_with_patch",
patch_args = [
"-s",
"-p0",
],
patches = ["@//:com_github_docker_docker.patch"],
sha256 = "358b4cdf0e2f11b7a99087e349b41d0f5a0a5b9da83b0ef0609ad08160bcd5f0",
strip_prefix = "moby-363e9a88a11be517d9e8c65c998ff56f774eb4dc",
urls = ["https://github.com/docker/docker/archive/363e9a88a11be517d9e8c65c998ff56f774eb4dc.zip"],
)
load("@bazel_gazelle//:deps.bzl", "go_repository")
go_repository(
name = "com_github_docker_docker",
build_file_proto_mode = "disable_global",
importpath = "github.com/docker/docker",
sha256 = "358b4cdf0e2f11b7a99087e349b41d0f5a0a5b9da83b0ef0609ad08160bcd5f0",
strip_prefix = "moby-363e9a88a11be517d9e8c65c998ff56f774eb4dc",
urls = ["https://github.com/docker/docker/archive/363e9a88a11be517d9e8c65c998ff56f774eb4dc.zip"],
) # keep
> bazel build --nobuild '@com_github_docker_docker//...' --profile=profile.json
Both targets should use the Bazel's native downloader and cache the zip in repository_cache. The patch file is copied paste from BuildBuddy's repository
Here is what the profiling looks like
Note that the purple stack on the left is com_github_docker_docker_with_patch
, here it is zooming in
Some stats:
//external:com_github_docker_docker_with_patch duration 1s 932ms 240us
download_and_extract 1s 250ms 845us
patch 649ms 364us
//external:com_github_docker_docker duration 36s 252ms 314us
download_and_extract 1s 166ms 577us
env_execute 35s 33ms 298us
So Gazelle scales linearly with the work tree complexity, the more files/directory, the more syscalls Gazelle and related extensions need to make to the File System and thus, can be expensive. Especially with network file system with limited bandwidth such as EBS or NFS, these slowness can be amplified significantly.
# Directory count
> fd --type d '.*' $(bazel info output_base 2>/dev/null)/external/com_github_docker_docker | wc -l
1304
# Bazel package count
> for i in $(seq 1 10); do
echo "depth: $i";
fd --min-depth $i --max-depth $i BUILD.bazel $(bazel info output_base 2>/dev/null)/external/com_github_docker_docker |\
wc -l;
done
depth: 1
0
depth: 2
25
depth: 3
119
depth: 4
89
depth: 5
135
depth: 6
246
depth: 7
256
depth: 8
91
depth: 9
30
depth: 10
0
Caching the result into a patch file essentially helps switching the generation to constant time, which can be desirable for some use cases.
@sluongng You mention you are optimizing for your p50 CI times here, do your CI nodes keep a Bazel output_base cache between runs, or does every run use a "fresh" cache? To add to my earlier post, Gazelle should only have to generate build files on fresh nodes, or when a rule has been invalidated, meaning it ideally shouldn't contribute to your p50 latency.
To optimize go_repository
rules on "fresh CI nodes" I see value in pre-warming them with the "patch cache" that you are proposing. Although, I do wonder if a more general approach would be more suitable here. For example, all of Bazel's output_base/external directory could be pre-loaded.
@uhthomas Since the majority of the time spent for github.com/docker/docker
is trying to resolve dependencies, I think https://github.com/bazelbuild/bazel-gazelle/issues/602 would be a good optimization that I can prioritize adding. tldr; We can make it so go_repository
rules do not need to make network calls when resolving dependencies.
Invalidations can be pretty frequent if you're keeping gazelle and rules_go up-to-date in your repo. Then engineers switch back and forth between branches where one is rebased after such an update and the other isn't. Then they see this Fetching @com_github_docker_docker
blocking their work a good portion of the time.
I tried to mimic the buildbuddy vendor.sh script, however I found that no matter how precisely I matched up the version, it caused different code to run than the go_repository
had fetched.
So I made a different recipe for the workaround documented here: https://github.com/bazelbuild/bazel-gazelle/pull/1198 which is basically just to hack the existing spot where Gazelle generation is run and have it dump the result into a patch file you can copy back out.
With #1201 it took me only 16s to cleanly fetch github.com/docker/docker
. Although not as fast as the patch file approach, the PR provides an immediate speed-up for all go_repository
rules without requiring any additional overhead (checking in patch files, etc.).
@alexeagle @uhthomas @sluongng do you want to give that PR a try?
@blico that's fantastic, it did seem like just a needless de-optimization happening.
I tested your PR, get this timing locally:
# invalidate the go_repository, e.g. add `build_file_name = "BUILD.bazel,BUILD,invalidate1"`
% time bazel fetch @com_github_docker_docker//api
INFO: All external dependencies fetched successfully.
Loading: 20 packages loaded
bazel fetch @com_github_docker_docker//api 0.05s user 0.07s system 0% cpu 18.656 total
compared with the original network access de-opt
% time bazel fetch @com_github_docker_docker//api
Loading: 0 packages loaded
Fetching @com_github_docker_docker; Running gazelle to generate BUILD files. Too slow? See https://github.com/bazelbuild/bazel-gazelle/issues/1190 159s
INFO: All external dependencies fetched successfully.
Loading: 1 packages loaded
bazel fetch @com_github_docker_docker//api 0.05s user 0.07s system 0% cpu 2:10.30 total
and compared with the checked-in patch file
% time bazel fetch @com_github_docker_docker//api
INFO: All external dependencies fetched successfully.
Loading: 1 packages loaded
bazel fetch @com_github_docker_docker//api 0.04s user 0.05s system 0% cpu 17.775 total
So I think you've totally solved this issue 🥳
I have done an investigation of this docker fetch issue. In my understanding the situation is as follows:
- my project use only small part of
github.com/docker/docker
package; - because
github.com/docker/docker
have notgo.mod
file, go mod commands don't see whole module dependency graph; -
go mod tidy
on my project adds togo.mod
andgo.sum
transitive dependencies to little part ofgithub.com/docker/docker
dependencies; - gazelle generates WORKSPACE based by go build info (provided by
go mod download -json
command) and have incompletegithub.com/docker/docker
module dependency list (just enough to build my project). But gazelle generate BUILD files for wholegithub.com/docker/docker
module; - for every unknown package gazelle call
go get -d
command and spend a lot of time (in my case summary it takes ~20 min).
As result: gazelle spends a lot of time resolving transitive dependencies from unused third-party code.
My workaroud:
- collect
go get -d
packages by #1170; - add this "unknown" dependencies to file with never-build
go build
marker like (this construncitons add missing dependencies togo.mod
file):
//go:build tools
// +build tools
package tools
import (
_ "github.com/moby/buildkit"
_ "github.com/olivere/env"
_ "rsc.io/letsencrypt"
...
)
Theoretically, Gazelle can simply ignore unknown dependencies without executing the go get -d
command (this dendencies is not provided by go.mod
, because located on unused packages).
As I understand, this should be correct with a dependency list based on a go.mod
file, but it may break the behavior in other exotic cases.
Theoretically, Gazelle can simply ignore unknown dependencies without executing the go get -d command (this dendencies is not provided by go.mod, because located on unused packages).
Today I add patch replacing defaultModInfo
function by stub:
func defaultModInfo(rc *RemoteCache, importPath string) (modPath string, err error) {
return "", nil
}
It works perfectly on our go.mod
-based project.
I tried to understand when it is useful to resolve an unknown import package. And couldn't come up with:
- if the code with unknown import is not used, then it does not matter
- if the code with unknown import is used, then we will still fall during compilation, since this repository is not in WORKSPACE.