bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Optimize representation of runfiles in compact execution log

Open fmeum opened this issue 1 year ago • 1 comments

Runfiles trees are now represented with a custom RunfilesTree message in the compact execution log. This allows using InputSets to representing all artifacts staged at canonical locations, with only symlinks and root symlinks stored flattened and with explicit runfiles paths.

fmeum avatar Aug 16 '24 21:08 fmeum

I resolved the conflict and subsequently fixed the existing tests after the merge of the SymlinkAction PR.

fmeum avatar Aug 22 '24 15:08 fmeum

While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in InputSet loses some information about the order. For example, a directory might have come before a file, but InputSet doesn't allow me to see that.

@tjgq Since entry IDs are globally unique anyway, what do you think of merging file_ids, directory_ids and unresolved_symlink_ids into a single field that preserves the order?

fmeum avatar Aug 30 '24 14:08 fmeum

While adding more tests, I noticed that there is a pathological case that the new representation currently doesn't handle: If multiple artifacts at canonical locations collide, then the last in nested set order wins. Runfiles always use postorder, which I can almost emulate, but the distinction by type in InputSet loses some information about the order. For example, a directory might have come before a file, but InputSet doesn't allow me to see that.

@tjgq Since entry IDs are globally unique anyway, what do you think of merging file_ids, directory_ids and unresolved_symlink_ids into a single field that preserves the order?

Oof, it's a disruptive change for any tools already consuming the compact format, but that's exactly why I wanted the format to remain experimental until Bazel 8 is released. Let's do it.

tjgq avatar Aug 30 '24 14:08 tjgq

@tjgq I added a bunch of tests and addressed your comments. Since we were already making breaking changes to the format, I slightly tweaked how we represent IDs also in other parts of the protocol. I can revert that part of the change if you prefer.

fmeum avatar Sep 02 '24 12:09 fmeum

@bazel-io fork 7.4.0

fmeum avatar Sep 19 '24 11:09 fmeum

I'm making the following changes on import:

  • Use new field ID for InputSet.input_ids and Output.output_id instead of reusing an existing ID (makes it easier to write a polyglot tool, as discussed out of band)
  • Replace occurrences of bazel in tests with TestConstants.PRODUCT_NAME (which is blaze internally)

tjgq avatar Sep 19 '24 12:09 tjgq

The changes in this PR have been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.4.0rc1. Thanks!

iancha1992 avatar Oct 11 '24 20:10 iancha1992