rules_proto_grpc
rules_proto_grpc copied to clipboard
GitHub Action Bazel caching does not seem to work rules_proto_grpc
Description
Hi. When I use rules_proto_grpc with a GitHub Action cache, it looks like the protobuf library is rebuilt each time (which is most of the build). I'm not sure why. I'm guessing the protobuf compiler output directory is not cached. I'm guessing it's saved to a special directory. I tried looking through rules_proto_grpc to see if the code was doing anything special but the code base is large. One weird thing is that I did not hit this issue when building protos directly (not using rules_proto_grpc).
Q - what's the directory that the compiled files are in for rules_proto_grpc? I can try to hack around this to cache the outputs.
I'm using rules_proto_grpc-3.1.0.
Analyzing: 68 targets (127 packages loaded, 8780 targets configured)
Analyzing: 68 targets (127 packages loaded, 8780 targets configured)
INFO: Analyzed 68 targets (182 packages loaded, 9284 targets configured).
INFO: Found 68 targets...
[0 / 3] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[26 / 220] Compiling src/google/protobuf/wire_format_lite.cc; 0s linux-sandbox ... (2 actions, 1 running)
[43 / 220] Compiling src/google/protobuf/generated_message_table_driven_lite.cc; 0s linux-sandbox ... (2 actions, 1 running)
[59 / 220] Compiling src/google/protobuf/compiler/csharp/csharp_reflection_class.cc; 0s linux-sandbox ... (2 actions running)
[72 / 220] Compiling src/google/protobuf/compiler/cpp/cpp_field.cc; 1s linux-sandbox ... (2 actions running)
[93 / 220] Compiling src/google/protobuf/compiler/objectivec/objectivec_enum_field.cc; 0s linux-sandbox ... (2 actions running)
[112 / 220] Compiling src/google/protobuf/descriptor.cc; 1s linux-sandbox ... (2 actions running)
[135 / 220] Compiling src/google/protobuf/compiler/java/java_message_builder_lite.cc; 1s linux-sandbox ... (2 actions running)
[207 / 291] GoCompilePkg external/org_golang_google_protobuf/reflect/protoreflect/protoreflect.a; 0s linux-sandbox ... (2 actions running)
[351 / 502] Compiling src/google/protobuf/compiler/cpp/cpp_file.cc [for host]; 2s linux-sandbox ... (2 actions running)
[399 / 502] Compiling src/google/protobuf/extension_set_heavy.cc [for host]; 0s linux-sandbox ... (2 actions running)
[442 / 502] Compiling src/google/protobuf/descriptor.pb.cc [for host]; 3s linux-sandbox ... (2 actions running)
[526 / 570] Building external/com_google_protobuf/libstruct_proto-speed.jar (1 source jar); 2s multiplex-worker ... (2 actions running)
INFO: From ProtoCompile proto/event/event_pb2.py:
Here's the cache step of my GitHub Action.
- name: Cache Bazel
uses: actions/[email protected]
env:
cache-name: bazel-cache
with:
path: |
~/.cache/bazelisk
~/.cache/bazel
# To write a new cache, we need to have a unique key.
# TODO - try to get "-${{ hashFiles('WORKSPACE', '**/BUILD.bazel') }}"" to work
key: ${{ runner.os }}-${{ env.cache-name }}-${{ github.ref }}
# We use restore-keys to find a recent cache to reuse.
restore-keys:
${{ runner.os }}-${{ env.cache-name }}-
I see this sometimes too and it's really difficult to track. Essentially Bazel thinks something has changed that requires rebuilding protobuf and the reasons can be many. First off, try running with --incompatible_strict_action_env. This flag further restricts how changes in env vars can break your caching, but likely won't solve the problem.
The de-facto way of tracing this sort of thing is using --execution_log_json_file and tracing the exact action that causes the action graph to differ. However, the tooling around this log is pretty lacking last time I tried, so this is not a simple process to figure out cause vs effect. See this doc, which explains the process (even though you arent using remote execution, the process is the same): https://docs.bazel.build/versions/main/remote-execution-caching-debug.html. Using bazel aquery may also be a better alternative, if it has the necessary info.
Realistically, if it's protobuf getting spuriously rebuilt, it's probably an issue with the protobuf repo rather than here. Having a check of their issues, this one may be related: https://github.com/protocolbuffers/protobuf/issues/6886. Basically, by using that use_default_shell_env option they are punching a hole in the Bazel sandbox and allowing any env var change to trash the cache. Since GitHub actions (and other CI systems) intentionally have different env vars on every run, this may be why you see this so badly there.
Therefore, an interesting experiment to run would be to manually strip your env vars to a bare minimum before calling Bazel to build. Something like (pinched from here):
env -i HOME="$HOME" LC_CTYPE="${LC_ALL:-${LC_CTYPE:-$LANG}}" PATH="$PATH" USER="$USER" bazel ...
I've sent https://github.com/bazelbuild/rules_proto/pull/206 to fix this finally upstream, so we don't need to compile protoc at all.
Excellent, is there anything that needs changing here to support this or is it just a drop-in replacement?
I ask because we have a toolchain for protoc here, for reasons that I do not remember. Presumably this can be ditched and replaced with the official toolchain now in rules_proto?
Under bzlmod, the toolchain registration will be automatic so there's nothing to do. WORKSPACE users might have to add a line.
I still have to argue with Google about why this is the right way to do it. They seem to want to go the other direction and make the protobuf repo even more load-bearing.