buildkube
buildkube copied to clipboard
Bazel Remote Cache + Remote Execution in Kubernetes
buildkube
|
|
|
Bazel | REAPI | Kubernetes |
buildkube uses rules_docker and rules_k8s to build and deploy bazel-buildfarm (java), bazel-buildbarn (golang) and/or buildgrid (python) into an existing kubernetes cluster. These are the 3 known open-source server-side implementations of the remote-execution-api (REAPI), plus the closed source google Remote Build Execution (RBE) service (alpha).
Known clients of the REAPI include bazel itself, recc, and possibly pants.
INSTRUCTIONS
- Clone this repository
- Edit the
WORKSPACEfilek8s_defaultsrule to point to your kubernetes cluster (should match$ kubectl config current-context) - Build and deploy an implementation: for example:
$ (cd farm/ && make install) - In a separate terminal, establish port-forwarding to the server
implementation
$ (cd farm/ && make port-forward) - Clone the abseil repository as a test case:
$ make abseil_clone - Compile abseil remotely:
$ make abseil
NOTES
- Bazel 0.17.1 or higher is required (primarily tested on 0.17.2 on an ubuntu laptop).
- Run all tests via
$ bazel test //.... - Each implementation goes in its own namespace.
$ kubectl get pods --all-namespacesto see all. - Consider adjusting
replicasin thedeploy.yamlfiles and/orbazelrcfile.
OBSERVATIONS
General
- Logging in all 3 implementations is scant and makes debugging difficult. Prometheus metrics are available in the barn impl (not examined thus far).
BuildFarm
-
BuildFarm worker does not detect if server goes down. Must manually
kubectl delete pod --selector=k8s-app=workerwhen re-installing or updating server deployment. -
When a worker registers itself with the server (operation-queue), it provides a dict of key:value pairs that must match the action execution requirements. In particular, the
worker.configcontainer-imagekey MUST be exactly matching the rbe_ubuntu image tag.
BuildBarn
- After spinning up a new install, the service seems flaky at first. Tend to
get several errors like:
/tmp/abseil-cpp/absl/utility/BUILD.bazel:22:1: C++ compilation of rule '//absl/utility:utility_test' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed catastrophically.
NOTE(@EdShoueten): There are three ways that can be used to alleviate this issue:
- Spawn more workers on your cluster.
- Pass in an explicit --jobs= to the build that is the same order of magnitude as the number of workers.
- Tune this flag on the scheduler process: https://github.com/EdSchouten/bazel-buildbarn/blob/master/cmd/bbb_scheduler/main.go#L22
BuildGrid
- Worker does not auto-reconnect to a new server (like buildfarm).
- Instance name (
main) must match across thebazelrc--instance_name=main, server args-scheduler main|ubuntu-scheduler:8981, and worker argsbot --remote=http://server:8980 --parent=main host-tools - Overall robustness to changes (increases) in job size and worker size is low. Seems to require resetting the server/workers in some cases. Seems happiest when job size matches worker replicas.