Speed up tar packing by lower compresslevel and create symbolic links for same files
What's the problem
The current tar packaging process is not efficient enough, as evidenced by:
- The default gzip compression level "6" can take 200% more time than compression level "1", but the size is only reduced by 10%-20%.
- For tar packages that reach gigabyte levels and are only transmitted and used within an organization’s intranet, packing speed might be more critical than size.
- If other tar packages are relied upon, the respective tar package must be created and then decompressed, causing a single file to possibly be compressed multiple times.
- If multiple srcs and deps projects' runfiles directories reference the same file, i.e., in the case of diamond dependencies,
add_filewill create N identical copies, significantly increasing the package size and time.
How to solve
- add
compresslevel: strwhich can be"" (auto, 6) | "0" | "1" | ... | "9" - provide
MappingManifestInfobesidesDefaultInfoto exposemanifest_fileandpackage_dirinfo to downstream targets
- add
merge_mappings: boolto enable this behavior manually
- add
auto_deduplicate: boolto identify added files across all manifest files by content paths and realpaths, and then auto-create symbolic links
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
These changes seem reasonable to me. However, I wonder if we should break it out into two or three PRs:
- Compression level
- MappingManifestInfo
- Deduplicate
What do you think?
Ah in fact my original work in my private workspace just has exact 3 commits to add such 3 features.
I'll split it tomorrow.