bazel-remote
bazel-remote copied to clipboard
Add support for more digest functions
Generalize the cache to support multiple digest functions. This PR updates the remote_execution proto definitions and introduces a new interface hashing.Hasher
that can be used to compute, validate as store blobs for a given DigestFunction
.
All blobs are stored as <kind>/<digest function>/<path>
, <kind>
being one of cas
, cas.v2
, ac
or raw
, <digest function>
being the name of the digest function (e.g. blake3
) and <path>
being the current blob, sharded by prefix (e.g. f1/f1...
). The exception to this rule is sha256
, for which we drop the <digest function>
component to maintain full backwards compatibility.
This change does not add any additional function, but they can be added as needed by adding a new type that implements hashing.Hasher
and registering it with hashing.register
(tested with sha1
as well, but did not include it in this PR).
In order to keep supporting other instances of bazel remote as backend proxy with the new digest functions, we additionally now set and get the X-Digest-Function
header in each request.
Fixes https://github.com/buchgr/bazel-remote/issues/710
I have a few concerns about landing this feature, but since you have a prototype is it something we can get some benchmark numbers for? How much does switching from sha256 to blake3 improve build times?
What are your concerns? There's a benchmark that can be run with bazel run cache/hashing:go_default_test -- --test.bench .
, I'll include my results here:
Linux x86_64
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
BenchmarkHashers/1B_BLAKE3-16 1000000000 0.0000057 ns/op
BenchmarkHashers/1B_SHA256-16 1000000000 0.0000018 ns/op
BenchmarkHashers/2B_BLAKE3-16 1000000000 0.0000048 ns/op
BenchmarkHashers/2B_SHA256-16 1000000000 0.0000015 ns/op
BenchmarkHashers/4B_BLAKE3-16 1000000000 0.0000044 ns/op
BenchmarkHashers/4B_SHA256-16 1000000000 0.0000011 ns/op
BenchmarkHashers/8B_BLAKE3-16 1000000000 0.0000030 ns/op
BenchmarkHashers/8B_SHA256-16 1000000000 0.0000014 ns/op
BenchmarkHashers/16B_BLAKE3-16 1000000000 0.0000037 ns/op
BenchmarkHashers/16B_SHA256-16 1000000000 0.0000009 ns/op
BenchmarkHashers/32B_BLAKE3-16 1000000000 0.0000062 ns/op
BenchmarkHashers/32B_SHA256-16 1000000000 0.0000009 ns/op
BenchmarkHashers/64B_BLAKE3-16 1000000000 0.0000036 ns/op
BenchmarkHashers/64B_SHA256-16 1000000000 0.0000011 ns/op
BenchmarkHashers/128B_BLAKE3-16 1000000000 0.0000091 ns/op
BenchmarkHashers/128B_SHA256-16 1000000000 0.0000018 ns/op
BenchmarkHashers/256B_BLAKE3-16 1000000000 0.0000031 ns/op
BenchmarkHashers/256B_SHA256-16 1000000000 0.0000009 ns/op
BenchmarkHashers/512B_BLAKE3-16 1000000000 0.0000052 ns/op
BenchmarkHashers/512B_SHA256-16 1000000000 0.0000013 ns/op
BenchmarkHashers/1KB_BLAKE3-16 1000000000 0.0000058 ns/op
BenchmarkHashers/1KB_SHA256-16 1000000000 0.0000019 ns/op
BenchmarkHashers/2KB_BLAKE3-16 1000000000 0.0000155 ns/op
BenchmarkHashers/2KB_SHA256-16 1000000000 0.0000030 ns/op
BenchmarkHashers/4KB_BLAKE3-16 1000000000 0.0000157 ns/op
BenchmarkHashers/4KB_SHA256-16 1000000000 0.0000047 ns/op
BenchmarkHashers/8KB_BLAKE3-16 1000000000 0.0000235 ns/op
BenchmarkHashers/8KB_SHA256-16 1000000000 0.0000073 ns/op
BenchmarkHashers/16KB_BLAKE3-16 1000000000 0.0000087 ns/op
BenchmarkHashers/16KB_SHA256-16 1000000000 0.0000148 ns/op
BenchmarkHashers/32KB_BLAKE3-16 1000000000 0.0000131 ns/op
BenchmarkHashers/32KB_SHA256-16 1000000000 0.0000261 ns/op
BenchmarkHashers/64KB_BLAKE3-16 1000000000 0.0000561 ns/op
BenchmarkHashers/64KB_SHA256-16 1000000000 0.0000554 ns/op
BenchmarkHashers/128KB_BLAKE3-16 1000000000 0.0001044 ns/op
BenchmarkHashers/128KB_SHA256-16 1000000000 0.0000985 ns/op
BenchmarkHashers/256KB_BLAKE3-16 1000000000 0.0002238 ns/op
BenchmarkHashers/256KB_SHA256-16 1000000000 0.0001969 ns/op
BenchmarkHashers/512KB_BLAKE3-16 1000000000 0.0002102 ns/op
BenchmarkHashers/512KB_SHA256-16 1000000000 0.0003972 ns/op
BenchmarkHashers/1MB_BLAKE3-16 1000000000 0.0003518 ns/op
BenchmarkHashers/1MB_SHA256-16 1000000000 0.0007869 ns/op
BenchmarkHashers/2MB_BLAKE3-16 1000000000 0.0006666 ns/op
BenchmarkHashers/2MB_SHA256-16 1000000000 0.001581 ns/op
BenchmarkHashers/4MB_BLAKE3-16 1000000000 0.001203 ns/op
BenchmarkHashers/4MB_SHA256-16 1000000000 0.003166 ns/op
BenchmarkHashers/8MB_BLAKE3-16 1000000000 0.002346 ns/op
BenchmarkHashers/8MB_SHA256-16 1000000000 0.006283 ns/op
BenchmarkHashers/16MB_BLAKE3-16 1000000000 0.004486 ns/op
BenchmarkHashers/16MB_SHA256-16 1000000000 0.01260 ns/op
BenchmarkHashers/32MB_BLAKE3-16 1000000000 0.009684 ns/op
BenchmarkHashers/32MB_SHA256-16 1000000000 0.02538 ns/op
BenchmarkHashers/64MB_BLAKE3-16 1000000000 0.02098 ns/op
BenchmarkHashers/64MB_SHA256-16 1000000000 0.05074 ns/op
BenchmarkHashers/128MB_BLAKE3-16 1000000000 0.04212 ns/op
BenchmarkHashers/128MB_SHA256-16 1000000000 0.1015 ns/op
BenchmarkHashers/256MB_BLAKE3-16 1000000000 0.08430 ns/op
BenchmarkHashers/256MB_SHA256-16 1000000000 0.2031 ns/op
BenchmarkHashers/512MB_BLAKE3-16 1000000000 0.1671 ns/op
BenchmarkHashers/512MB_SHA256-16 1000000000 0.4062 ns/op
BenchmarkHashers/1GB_BLAKE3-16 1000000000 0.3386 ns/op
BenchmarkHashers/1GB_SHA256-16 1000000000 0.8117 ns/op
Linux aarch64
goos: linux
goarch: arm64
BenchmarkHashers/1B_BLAKE3-16 1000000000 0.0000062 ns/op
BenchmarkHashers/1B_SHA256-16 1000000000 0.0000021 ns/op
BenchmarkHashers/2B_BLAKE3-16 1000000000 0.0000094 ns/op
BenchmarkHashers/2B_SHA256-16 1000000000 0.0000022 ns/op
BenchmarkHashers/4B_BLAKE3-16 1000000000 0.0000057 ns/op
BenchmarkHashers/4B_SHA256-16 1000000000 0.0000012 ns/op
BenchmarkHashers/8B_BLAKE3-16 1000000000 0.0000062 ns/op
BenchmarkHashers/8B_SHA256-16 1000000000 0.0000021 ns/op
BenchmarkHashers/16B_BLAKE3-16 1000000000 0.0000054 ns/op
BenchmarkHashers/16B_SHA256-16 1000000000 0.0000021 ns/op
BenchmarkHashers/32B_BLAKE3-16 1000000000 0.0000070 ns/op
BenchmarkHashers/32B_SHA256-16 1000000000 0.0000018 ns/op
BenchmarkHashers/64B_BLAKE3-16 1000000000 0.0000063 ns/op
BenchmarkHashers/64B_SHA256-16 1000000000 0.0000018 ns/op
BenchmarkHashers/128B_BLAKE3-16 1000000000 0.0000072 ns/op
BenchmarkHashers/128B_SHA256-16 1000000000 0.0000019 ns/op
BenchmarkHashers/256B_BLAKE3-16 1000000000 0.0000073 ns/op
BenchmarkHashers/256B_SHA256-16 1000000000 0.0000025 ns/op
BenchmarkHashers/512B_BLAKE3-16 1000000000 0.0000090 ns/op
BenchmarkHashers/512B_SHA256-16 1000000000 0.0000021 ns/op
BenchmarkHashers/1KB_BLAKE3-16 1000000000 0.0000100 ns/op
BenchmarkHashers/1KB_SHA256-16 1000000000 0.0000023 ns/op
BenchmarkHashers/2KB_BLAKE3-16 1000000000 0.0000157 ns/op
BenchmarkHashers/2KB_SHA256-16 1000000000 0.0000029 ns/op
BenchmarkHashers/4KB_BLAKE3-16 1000000000 0.0000239 ns/op
BenchmarkHashers/4KB_SHA256-16 1000000000 0.0000046 ns/op
BenchmarkHashers/8KB_BLAKE3-16 1000000000 0.0000366 ns/op
BenchmarkHashers/8KB_SHA256-16 1000000000 0.0000064 ns/op
BenchmarkHashers/16KB_BLAKE3-16 1000000000 0.0000664 ns/op
BenchmarkHashers/16KB_SHA256-16 1000000000 0.0000125 ns/op
BenchmarkHashers/32KB_BLAKE3-16 1000000000 0.0001240 ns/op
BenchmarkHashers/32KB_SHA256-16 1000000000 0.0000227 ns/op
BenchmarkHashers/64KB_BLAKE3-16 1000000000 0.0002486 ns/op
BenchmarkHashers/64KB_SHA256-16 1000000000 0.0000433 ns/op
BenchmarkHashers/128KB_BLAKE3-16 1000000000 0.0004805 ns/op
BenchmarkHashers/128KB_SHA256-16 1000000000 0.0000853 ns/op
BenchmarkHashers/256KB_BLAKE3-16 1000000000 0.0009378 ns/op
BenchmarkHashers/256KB_SHA256-16 1000000000 0.0001688 ns/op
BenchmarkHashers/512KB_BLAKE3-16 1000000000 0.001875 ns/op
BenchmarkHashers/512KB_SHA256-16 1000000000 0.0003356 ns/op
BenchmarkHashers/1MB_BLAKE3-16 1000000000 0.003737 ns/op
BenchmarkHashers/1MB_SHA256-16 1000000000 0.0006701 ns/op
BenchmarkHashers/2MB_BLAKE3-16 1000000000 0.007473 ns/op
BenchmarkHashers/2MB_SHA256-16 1000000000 0.001350 ns/op
BenchmarkHashers/4MB_BLAKE3-16 1000000000 0.01493 ns/op
BenchmarkHashers/4MB_SHA256-16 1000000000 0.002690 ns/op
BenchmarkHashers/8MB_BLAKE3-16 1000000000 0.02988 ns/op
BenchmarkHashers/8MB_SHA256-16 1000000000 0.005363 ns/op
BenchmarkHashers/16MB_BLAKE3-16 1000000000 0.05984 ns/op
BenchmarkHashers/16MB_SHA256-16 1000000000 0.01073 ns/op
BenchmarkHashers/32MB_BLAKE3-16 1000000000 0.1200 ns/op
BenchmarkHashers/32MB_SHA256-16 1000000000 0.02146 ns/op
BenchmarkHashers/64MB_BLAKE3-16 1000000000 0.2400 ns/op
BenchmarkHashers/64MB_SHA256-16 1000000000 0.04291 ns/op
BenchmarkHashers/128MB_BLAKE3-16 1000000000 0.4803 ns/op
BenchmarkHashers/128MB_SHA256-16 1000000000 0.08592 ns/op
BenchmarkHashers/256MB_BLAKE3-16. 1000000000 0.9611 ns/op
BenchmarkHashers/256MB_SHA256-16 1000000000 0.1717 ns/op
BenchmarkHashers/512MB_BLAKE3-16 1 1915870280 ns/op
BenchmarkHashers/512MB_SHA256-16 1000000000 0.3437 ns/op
BenchmarkHashers/1GB_BLAKE3-16 1 3829768354 ns/op
BenchmarkHashers/1GB_SHA256-16 1000000000 0.6874 ns/op
-
sha256
is still supported, there's no regression - The server can support all hashing algorithms at the same time, it is the clients that decides what to use in the requests. If necessary, we can introduce a flag to allow banning some functions from the server side (e.g. if someone does not want to support blake3 intentionally because running on arm)
- This opens to more digest functions. SHA256TREE is very promising, although not yet available in bazel. But adding it to bazel-remote would be trivial once it is available, the complexity would just be in the implementation of the hashing algorithm itself.
What are your concerns?
I try to be conservative when it comes to changing the cache directory format, because it can cause trouble for people eg trying out a new bazel-remote version and then switching back to a slightly older version. It's something we can do if there is a good need, it just needs to be thought through pretty well first.
There's a benchmark that can be run with ...
I was more thinking about a more real-world benchmark, like comparing hot and cold cache builds of some appropriately sized project using bazel with blake3 and sha256 (ie not too large that it becomes a pain for us to run). What kind of projects does it help with? And by how much?
@mostynb I've seen most improvements for targets that produce large files, like deployables (fat jars or binaries, docker images). If you can suggest one or two OSS projects to test against I'll happily run the tests. Alternatively, I can propose to land this change without adding the support for blake3 and only keep the part that makes the cache more general towards the hashing function used. This will add support for the new DigestFunction
fields in the bre protocol, which a step forward but does not immediately changes the folder structure since sha256 will keep using the cache folder as before. We can then discuss if blake3 is sufficiently better to consider adding the support separately. WDYT?
@mostynb I've seen most improvements for targets that produce large files, like deployables (fat jars or binaries, docker images). If you can suggest one or two OSS projects to test against I'll happily run the tests.
This might be difficult, but probably worth the effort. Last time I tried something similar I had trouble finding many opensource projects that used bazel and worked with the latest bazel version. How about bazel itself as the first test?
Alternatively, I can propose to land this change without adding the support for blake3 and only keep the part that makes the cache more general towards the hashing function used. This will add support for the new
DigestFunction
fields in the bre protocol, which a step forward but does not immediately changes the folder structure since sha256 will keep using the cache folder as before. We can then discuss if blake3 is sufficiently better to consider adding the support separately. WDYT?
I think we should wait to see how blake3 performs first.
@mostynb I've seen most improvements for targets that produce large files, like deployables (fat jars or binaries, docker images). If you can suggest one or two OSS projects to test against I'll happily run the tests.
This might be difficult, but probably worth the effort. Last time I tried something similar I had trouble finding many opensource projects that used bazel and worked with the latest bazel version. How about bazel itself as the first test?
Alternatively, I can propose to land this change without adding the support for blake3 and only keep the part that makes the cache more general towards the hashing function used. This will add support for the new
DigestFunction
fields in the bre protocol, which a step forward but does not immediately changes the folder structure since sha256 will keep using the cache folder as before. We can then discuss if blake3 is sufficiently better to consider adding the support separately. WDYT?I think we should wait to see how blake3 performs first.
JC - were you able to run the tests against any OSS projects? Did BLAKE3 end up helping?
@jackwellsxyz @mostynb sorry, I've been busy and I did not get around to run any test. I'll try to find some time in the next few days.
Sorry, this took longer than expected. I tried testing it a bit with bazel itself as @mostynb suggested and did not notice substantial difference (just 1-2% faster). I tested both fully uncached builds and fully cached builds. I've also tested the case where output are already built locally (i.e. their in bazel-out/) but the server has to restart, as my understanding is that bazel will have to compute the hashes of all the local outputs, but also in that case no significant difference.
I think there's still logic in adding this functionality imo. I'm working on a repo that enforces BLAKE3 and so I have to set my bazelrc to explicitly use SHA256. Issue there is that I can't use Bazel module lock files since bazel "updates" all of the BLAKE3 bzlTransitiveDigest hashes to SHA256 ones during the pre-commit checks that use bazel to run things like detekt on files for check-in.
I regret that I found this pr only after I completed the support of sha512 based on https://github.com/buchgr/bazel-remote/pull/175. https://github.com/hunshcn/bazel-remote/tree/feat/cas-md5-sha1-sha512
Is there an estimated landing time? I hope to increase the support of sha512 based on this.
Because the npm dependency of rules_js only supports sha512.
@mostynb do you think we can land this change to support other digest functions?
I am reluctant to land this change in the near future, but having said that I will try to find some time next week to read through this again. The monthly REAPI working group meeting is tomorrow, that will take up a fair bit of my bazel-remote time budget this week.
any update?
bazel 7.2.0 has been released.
https://github.com/bazelbuild/bazel/pull/21996
Remote Asset Downloader support digest func now.
@mostynb
@mostynb did you get around to review this? Would love to hear your feedback to get this to a mergable state