bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Add `--incompatible_compact_repo_mapping_manifest`

Open fmeum opened this issue 11 months ago • 15 comments

With the flag enabled, <binary>.repo_mapping contains

+deps+*,aaa,_main
+deps+*,dep,+deps+dep1
+deps+*,dep1,+deps+dep1
+deps+*,dep2,+deps+dep2
+deps+*,dep3,+deps+dep3

instead of

+deps+dep1,aaa,_main
+deps+dep1,dep,+deps+dep1
+deps+dep1,dep1,+deps+dep1
+deps+dep1,dep2,+deps+dep2
+deps+dep1,dep3,+deps+dep3
+deps+dep2,aaa,_main
+deps+dep2,dep,+deps+dep1
+deps+dep2,dep1,+deps+dep1
+deps+dep2,dep2,+deps+dep2
+deps+dep2,dep3,+deps+dep3
...

for the deps module extension.

Runfiles libraries have to be updated to find entries using the new format.

Work towards #24808

fmeum avatar Dec 29 '24 21:12 fmeum

@Wyverald Should we ask runfiles libraries to perform a linear match on all lines with prefixes? That's O(num extensions), but avoids specifying the separator char or scheme.

fmeum avatar Dec 30 '24 08:12 fmeum

@Wyverald Friendly ping

fmeum avatar Jan 30 '25 11:01 fmeum

Before we commit to introducing a new manifest format, could you briefly explain to me why we're recording so many entries in the manifest? I vaguely remember that we try to trim entries down to just the ones that we actually include runfiles for. Does that actually end up being every single repo generated by the extension? (Is it because of Python source files?)

I really wish this was something that could transparently be taken care of by compression, but I guess that's a bit of a pipe dream.

Wyverald avatar Jan 30 '25 22:01 Wyverald

We are trimming down the target repos to those that provide runfiles, but for NPM and Python that's typically every extension repo.

Most of those won't use a runfiles library, so if we tracked that (my original proposal had something like this, but we decided against it for being too complicated), we could potentially trim down the source repos. But if a ruleset for a dynamic language ever adopts repo mapped language imports using the runfiles library (rules_python has been discussing this at some point), even that wouldn't help.

Compression is a good fix for remote execution. I have a change out that lazily streams these files to the executor with BwoB, but that doesn't help for local builds.

fmeum avatar Jan 30 '25 22:01 fmeum

I see. It somehow escaped me, but thinking about it again, for a top-level binary that depends on a lot of Python code, there's basically no way to "trim" anything here for any meaningful measure.

We should definitely tread carefully here -- changing the manifest format can be rather disruptive, especially since it's not versioned (so it basically always has to be forwards-compatible).

Wyverald avatar Feb 03 '25 22:02 Wyverald

@Wyverald Should we ask runfiles libraries to perform a linear match on all lines with prefixes? That's O(num extensions), but avoids specifying the separator char or scheme.

Could you elaborate a bit what "a linear match on all lines with prefixes" means?

Wyverald avatar Feb 03 '25 22:02 Wyverald

Could you elaborate a bit what "a linear match on all lines with prefixes" means?

Runfiles libraries have essentially two ways to look up mappings in the presence of wildcards. First, try to look up an exact match for the source repo in some equivalent of HashMap. If there is no such match, then:

  1. Iterate over all manifest lines with a wildcard character and check whether they represent a prefix. This takes time O(wildcard entries).
  2. Assume that prefixes are cut off at the last +. Then the wildcard entries can be preprocessed into a HashMap in which entries can be looked up directly by cutting of at the last + and performing a map lookup. This takes constant time, but requires knowledge about how prefixes are constructed.

fmeum avatar Feb 03 '25 22:02 fmeum

I see. I would like to avoid encoding any knowledge about the repo name format whatsoever (you'd probably expect that from me at this point :)). On the runfiles library side, some tricks can be done to speed up the lookup (e.g. constructing a trie), so performance should still be good enough.

Wyverald avatar Feb 05 '25 20:02 Wyverald

The equivalent of a TreeMap is probably good enough. I will follow up with PRs for the main runfiles libraries.

fmeum avatar Feb 18 '25 12:02 fmeum

Regarding comment from @fmeum

language imports using the runfiles library (rules_python has been discussing this at some point)

Right now we are also researching ways to lay out the files in a way that would not require this (i.e. create a virtual env and put the files in a way that is natural to Python). Right now no one is pursuing reading the runfiles manifest to implement an importlib alternative.

The virtual env approach is being researched here: https://github.com/bazelbuild/rules_python/pull/2617

aignas avatar Feb 27 '25 01:02 aignas

@Wyverald I resolved the conflicts, this should be good for another review.

fmeum avatar Mar 07 '25 08:03 fmeum

@Wyverald Friendly ping, let's merge this so that runfiles library work can start :-)

fmeum avatar Mar 19 '25 09:03 fmeum

Any chance we can get this in? I back-ported this PR so I could use it in my Java build and for java_testsuite() I get a reduction in size of the repo mapping files from 500MB to 100K! That's each file... the build used to fill in my disk.

ob avatar Jun 09 '25 21:06 ob

@bazel-io fork 8.3.0

fmeum avatar Jun 09 '25 22:06 fmeum

Filed https://github.com/bazelbuild/bazel/issues/26262

fmeum avatar Jun 11 '25 11:06 fmeum

FYI: I'm running into some mysterious cc_integration_test failures trying to submit (https://buildkite.com/bazel/google-bazel-presubmit/builds/92974#01978497-7ce4-45d2-816d-36f8088e33a9). Will dig a bit later.

Wyverald avatar Jun 18 '25 20:06 Wyverald