Optimize representation of runfiles in compact execution log
Runfiles trees are now represented with a custom RunfilesTree message in the compact execution log. This allows using InputSets to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths.
I resolved the conflict and subsequently fixed the existing tests after the merge of the SymlinkAction PR.
While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in InputSet loses some information about the order. For example, a directory might have come before a file, but InputSet doesn't allow me to see that.
@tjgq Since entry IDs are globally unique anyway, what do you think of merging file_ids, directory_ids and unresolved_symlink_ids into a single field that preserves the order?
While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in
InputSetloses some information about the order. For example, a directory might have come before a file, butInputSetdoesn't allow me to see that.@tjgq Since entry IDs are globally unique anyway, what do you think of merging
file_ids,directory_idsandunresolved_symlink_idsinto a single field that preserves the order?
Oof, it's a disruptive change for any tools already consuming the compact format, but that's exactly why I wanted the format to remain experimental until Bazel 8 is released. Let's do it.
@tjgq I added a bunch of tests and addressed your comments. Since we were already making breaking changes to the format, I slightly tweaked how we represent IDs also in other parts of the protocol. I can revert that part of the change if you prefer.
@bazel-io fork 7.4.0
I'm making the following changes on import:
- Use new field ID for
InputSet.input_idsandOutput.output_idinstead of reusing an existing ID (makes it easier to write a polyglot tool, as discussed out of band) - Replace occurrences of
bazelin tests withTestConstants.PRODUCT_NAME(which isblazeinternally)
The changes in this PR have been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.4.0rc1. Thanks!