Add `--incompatible_compact_repo_mapping_manifest`
With the flag enabled, <binary>.repo_mapping contains
+deps+*,aaa,_main
+deps+*,dep,+deps+dep1
+deps+*,dep1,+deps+dep1
+deps+*,dep2,+deps+dep2
+deps+*,dep3,+deps+dep3
instead of
+deps+dep1,aaa,_main
+deps+dep1,dep,+deps+dep1
+deps+dep1,dep1,+deps+dep1
+deps+dep1,dep2,+deps+dep2
+deps+dep1,dep3,+deps+dep3
+deps+dep2,aaa,_main
+deps+dep2,dep,+deps+dep1
+deps+dep2,dep1,+deps+dep1
+deps+dep2,dep2,+deps+dep2
+deps+dep2,dep3,+deps+dep3
...
for the deps module extension.
Runfiles libraries have to be updated to find entries using the new format.
Work towards #24808
@Wyverald Should we ask runfiles libraries to perform a linear match on all lines with prefixes? That's O(num extensions), but avoids specifying the separator char or scheme.
@Wyverald Friendly ping
Before we commit to introducing a new manifest format, could you briefly explain to me why we're recording so many entries in the manifest? I vaguely remember that we try to trim entries down to just the ones that we actually include runfiles for. Does that actually end up being every single repo generated by the extension? (Is it because of Python source files?)
I really wish this was something that could transparently be taken care of by compression, but I guess that's a bit of a pipe dream.
We are trimming down the target repos to those that provide runfiles, but for NPM and Python that's typically every extension repo.
Most of those won't use a runfiles library, so if we tracked that (my original proposal had something like this, but we decided against it for being too complicated), we could potentially trim down the source repos. But if a ruleset for a dynamic language ever adopts repo mapped language imports using the runfiles library (rules_python has been discussing this at some point), even that wouldn't help.
Compression is a good fix for remote execution. I have a change out that lazily streams these files to the executor with BwoB, but that doesn't help for local builds.
I see. It somehow escaped me, but thinking about it again, for a top-level binary that depends on a lot of Python code, there's basically no way to "trim" anything here for any meaningful measure.
We should definitely tread carefully here -- changing the manifest format can be rather disruptive, especially since it's not versioned (so it basically always has to be forwards-compatible).
@Wyverald Should we ask runfiles libraries to perform a linear match on all lines with prefixes? That's
O(num extensions), but avoids specifying the separator char or scheme.
Could you elaborate a bit what "a linear match on all lines with prefixes" means?
Could you elaborate a bit what "a linear match on all lines with prefixes" means?
Runfiles libraries have essentially two ways to look up mappings in the presence of wildcards. First, try to look up an exact match for the source repo in some equivalent of HashMap. If there is no such match, then:
- Iterate over all manifest lines with a wildcard character and check whether they represent a prefix. This takes time
O(wildcard entries). - Assume that prefixes are cut off at the last
+. Then the wildcard entries can be preprocessed into aHashMapin which entries can be looked up directly by cutting of at the last+and performing a map lookup. This takes constant time, but requires knowledge about how prefixes are constructed.
I see. I would like to avoid encoding any knowledge about the repo name format whatsoever (you'd probably expect that from me at this point :)). On the runfiles library side, some tricks can be done to speed up the lookup (e.g. constructing a trie), so performance should still be good enough.
The equivalent of a TreeMap is probably good enough. I will follow up with PRs for the main runfiles libraries.
Regarding comment from @fmeum
language imports using the runfiles library (rules_python has been discussing this at some point)
Right now we are also researching ways to lay out the files in a way that would not require this (i.e. create a virtual env and put the files in a way that is natural to Python). Right now no one is pursuing reading the runfiles manifest to implement an importlib alternative.
The virtual env approach is being researched here: https://github.com/bazelbuild/rules_python/pull/2617
@Wyverald I resolved the conflicts, this should be good for another review.
@Wyverald Friendly ping, let's merge this so that runfiles library work can start :-)
Any chance we can get this in? I back-ported this PR so I could use it in my Java build and for java_testsuite() I get a reduction in size of the repo mapping files from 500MB to 100K! That's each file... the build used to fill in my disk.
@bazel-io fork 8.3.0
Filed https://github.com/bazelbuild/bazel/issues/26262
FYI: I'm running into some mysterious cc_integration_test failures trying to submit (https://buildkite.com/bazel/google-bazel-presubmit/builds/92974#01978497-7ce4-45d2-816d-36f8088e33a9). Will dig a bit later.