rules_pkg icon indicating copy to clipboard operation
rules_pkg copied to clipboard

Add attribute to set compression type and level for pkg_tar

Open siddharthab opened this issue 7 years ago • 3 comments

Description of the problem / feature request:

pkg_tar does not have an attribute that exposes the compresslevel argument for tarfile. The default value for this argument is 9, which is max compression, slowest speed. This can take twice as much time.

If we don't want to have an extra attribute, then at least the default value should be 6 -- the same as in gzip CLI.

https://github.com/bazelbuild/bazel/blob/0b6899be05d51088267ce213850b9ead96d68b8e/tools/build_defs/pkg/archive.py#L119

Feature requests: what underlying problem are you trying to solve with this feature?

Make pkg_tar faster for larger archives when using compression.

$ python --version
Python 2.7.12
$ docker pull rocker/r-ver:3.4.4
$ docker save rocker/r-ver:3.4.4 -o image.tar
$ cat test.py
import sys
import gzip
import shutil

with open('image.tar', 'rb') as f_in, gzip.open('image.tar.gz.' + sys.argv[1], 'wb', compresslevel=int(sys.argv[1])) as f_out:
  shutil.copyfileobj(f_in, f_out)
$ time python test.py 1
real	0m12.320s ... 
$ time python test.py 6
real	0m28.616s ...
$ time python test.py 9
real	1m37.240s ...
$ ls -lh image.tar.gz.*
... 237M ... image.tar.gz.1
... 217M ... image.tar.gz.6
... 216M ... image.tar.gz.9
$ time gzip -1 -k image.tar
real	0m11.855s ...
$ time gzip -6 -k image.tar
real	0m27.176s ...
$ time gzip -9 -k image.tar
real 1m32.707s ...

What operating system are you running Bazel on?

Ubuntu

What's the output of bazel info release?

release 0.9.0

Have you found anything relevant by searching the web?

There is an opinion in many places that pkg_tar, or other python based archiving tools in bazel repos, are slow because python's gzip is slow. But that is not my experience in the example above.

siddharthab avatar Apr 17 '18 02:04 siddharthab

While an attribute would be great, even better would be to have the default level depend on compilation mode. That is, -c opt -> 9, -c fast -> 6 or maybe lower.

Type is a little bit trickier; .deb does support zstd nowadays, and would be great to have because it's so, so much faster than gzip at similar compression ratios. But now we're talking about requiring zstd to be present in the build. Which it might be already for other things. So what we want there starts to look like a toolchain configuration for compressor for different archive types.

adam-azarchs avatar May 03 '23 19:05 adam-azarchs

zstd could be an optional toolchain. pkg_tar could use it if found. I think that would break compatibility with Bazel 4.x, but I am sort of OK with that.

Compress level defaulting from compilation mode seems interesting.

aiuto avatar May 03 '23 20:05 aiuto

#720 at least reduces this to 6

cameron-martin avatar Dec 19 '23 10:12 cameron-martin