rules_k8s
rules_k8s copied to clipboard
Dependency on volatile-status.txt breaks remote cache
The Bazel docs state:
Bazel pretends that the volatile file never changes
However, that is untrue when it comes to the remote cache.
Our CI environment runs e2e tests that depend on a k8s_object
on every run even when there are no changes due to the interaction between rules_k8s, stamping, and the remote_cache.
I'd like to propose dropping the dependency on the volatile file, while retaining the stable file, or at least making it optional as a workaround since the Bazel maintainers are pushing back on fixing the underlying issue.
I found that depending on the .substituted.yaml
output file is a workaround to this remote caching issue. The downside is that there may be cases when the test doesn't run due to some changes in the rule related to stamping.
General best practice here is to avoid stamping -- stamping is by definition volatile.
Another practice we use over in kubernetes/test-infra is to separate out the stamp layer into its own final layer.
AKA instead of making the final stamped layer include a large go binary in it, we put this into an intermediary layer.
https://github.com/kubernetes/test-infra/blob/35cfa7b7d66098fabedb8fb1c0e15be29088971e/prow/def.bzl#L26-L49
In other words instead of using the rules_docker container_image directly we wrap this into a prow_image macro which separates out the stamped layer from the large binary layer.
This way 99% of the image layers remain reproducible (and therefore cached) and just the tiny final layer with stamp info changes between builds (assuming the image hasn't changed)
@mariusgrigoriu and @fejta I created https://github.com/bazelbuild/rules_k8s/pull/592 as another attempt at addressing this issue. It exposes the stamp_srcs
attribute as a means to override volatile-status.txt
(and stable-status.txt
if desired).
That looks like a creative solution!
We've settled into adding the .substituted.yaml
output and a custom resolver as data dependencies on e2e tests for our project. The test uses those dependencies only to trigger e2e execution. The custom resolver substitutes values like cluster and namespace, which are different per pipeline run so we don't run e2e with every commit.
I think the PR might simplify things a bit so we can just have the k8s_object
rule as the data dependency as long as our resolver gets its values from sources other than the stable and volatile status files. If we did want to use those files, we might be stuck either way, but it would require some testing.
Found a use case where depending on the .substituted.yaml
is problematic and that's with k8s_objects
. There isn't a single file to depend upon so the cleanest and easiest way would be to depend on the rule itself, so looks like #592 could be useful with that.
Just tested PR #592. I wasn't able to find a way to include only the stable status file as I don't know what the label for the generated stabel-status.txt
is. However, I was able to not depend on any stamp files by referencing an empty filegroup. That said, it would be nicer if you could set stamp_srcs=[]
or stamp=False
to accomplish that goal. All in all, this seems to work if you don't want to rely on any of bazel's stamps, including the stable ones.
@mariusgrigoriu I am using the following approach, which concatenates and filters out values from stable-status and volatile-status:
genrule(
name = "all-status",
visibility = ["//visibility:public"],
outs = [
"all-status.txt",
],
cmd = ("cat bazel-out/stable-status.txt bazel-out/volatile-status.txt" +
"| grep -Ev '(BRANCH_TAIL|BUILD_TIMESTAMP|BUILD_HOST|BUILD_NUMBER|GIT_COMMIT_BODY)' > $(location :all-status.txt) "),
stamp = True,
)
k8s_object(
name = "foo",
...
stamp_srcs = ["//:all-status"],
)
All the best!
The above example helps a lot. I think this works good enough. What do you think about including stamp_srcs
in the k8s_default
? I'm finding that we want every rule to have it, so it would be nice if we could set it as a default. Happy to submit PR / issue on it.