bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Check that digesting consumes the expected number of bytes.

Open benjaminp opened this issue 2 years ago • 7 comments

In Bazel, the time a file is stated to find its size can be very far from the time the file's digest is computed, which creates a window for shenanigans. Allow passing the expected size into Path.getDigest and raise an error there if the number of digested bytes doesn't match the previously seen size.

benjaminp avatar Sep 27 '23 18:09 benjaminp

If file size differs, the digest must be different. Can you share more details about the motivation/background for this PR?

coeuvre avatar Sep 28 '23 10:09 coeuvre

The potential problem is Bazel doesn't compute the file size by counting how many bytes are fed into the hash function. Rather, the file is stated.. Thus, there's a race between computing the size of the digest and computing the digest itself.

benjaminp avatar Sep 28 '23 14:09 benjaminp

In other words, I'm trying to add a best-effort consistency check.

benjaminp avatar Sep 28 '23 15:09 benjaminp

Internally, we embed the size into the digest (i.e. the file size can be derived from the digest) so I can understand the change. But I am not sure whether missing the file size could be a problem in practice.

Considering lots of effect will be needed to import this PR, I will let @meisterT make the decision.

coeuvre avatar Sep 29 '23 09:09 coeuvre

If we decide to add this, we should consider merging it with --experimental_guard_against_concurrent_changes (which only applies when uploading action results to a remote/disk cache). In other words, I don't think we should have two separate mechanisms to detect unexpected changes to build outputs.

tjgq avatar Sep 29 '23 12:09 tjgq

Steps can likely to taken to make this easier to merge if needed. Note I added a Path.getDigest overload taking a size; the no-argument version is preserved meaning no callers of Path.getDigest() will break. I did modify FileSystem.getDigest on the assumption that it is a less prominent interface than Path. If Google having many FileSystem implementations is a difficulty, I could similarly add an override for FileSystem.getDigest.

With regards to --experimental_guard_against_concurrent_changes (https://github.com/bazelbuild/bazel/issues/3360#issuecomment-1541332142), the problem is no one sets that flag. Unlike that flag, this change does not add any more I/O to Bazel; it's a simple internal comparison of two integers.

benjaminp avatar Sep 29 '23 15:09 benjaminp

Internally, we embed the size into the digest (i.e. the file size can be derived from the digest) so I can understand the change.

Alternatively, would there be a way to align the hash functions available in Bazel with this construction?

fmeum avatar Sep 30 '23 13:09 fmeum