rules_pkg icon indicating copy to clipboard operation
rules_pkg copied to clipboard

Speed up tar packing by lower compresslevel and create symbolic links for same files

Open gdh1995 opened this issue 1 year ago • 3 comments

What's the problem

The current tar packaging process is not efficient enough, as evidenced by:

  • The default gzip compression level "6" can take 200% more time than compression level "1", but the size is only reduced by 10%-20%.
  • For tar packages that reach gigabyte levels and are only transmitted and used within an organization’s intranet, packing speed might be more critical than size.
  • If other tar packages are relied upon, the respective tar package must be created and then decompressed, causing a single file to possibly be compressed multiple times.
  • If multiple srcs and deps projects' runfiles directories reference the same file, i.e., in the case of diamond dependencies, add_file will create N identical copies, significantly increasing the package size and time.

How to solve

  1. add compresslevel: str which can be "" (auto, 6) | "0" | "1" | ... | "9"
  2. provide MappingManifestInfo besides DefaultInfo to expose manifest_file and package_dir info to downstream targets
  • add merge_mappings: bool to enable this behavior manually
  1. add auto_deduplicate: bool to identify added files across all manifest files by content paths and realpaths, and then auto-create symbolic links

gdh1995 avatar Aug 26 '24 09:08 gdh1995

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Aug 26 '24 09:08 google-cla[bot]

These changes seem reasonable to me. However, I wonder if we should break it out into two or three PRs:

  • Compression level
  • MappingManifestInfo
  • Deduplicate

What do you think?

cgrindel avatar Aug 27 '24 13:08 cgrindel

Ah in fact my original work in my private workspace just has exact 3 commits to add such 3 features.

I'll split it tomorrow.

gdh1995 avatar Aug 27 '24 14:08 gdh1995