wheel icon indicating copy to clipboard operation
wheel copied to clipboard

Use bz2 or lzma for python >= 3.3 ?

Open NotSqrt opened this issue 7 years ago • 13 comments

Hi,

  1. Just an idea : now that many python projects are ditching python2.7, and mostly support only python >= 3.4, if a wheel is only for python3.X, it could reduce the size by using bz2 or lzma.

Typical example : numpy-1.15.0-cp37-cp37m-manylinux1_x86_64.whl (size: 13845063 bytes), only for python 3.7, could gain 50% by using lzma (size: 6899142 bytes)..

  1. A simpler idea if keeping zlib, starting with python 3.7, zipfile exposes the compresslevel param. When wheels are generated from python 3.7, it could also help to set compresslevel to a value higher than 6.

Thanks !

NotSqrt avatar Aug 09 '18 16:08 NotSqrt

Certainly not by default on any Python, and PyPI may consider refusing such wheels.

agronholm avatar Sep 24 '18 21:09 agronholm

The second idea sounds a bit safer. Do you have numbers for this?

agronholm avatar Sep 29 '18 11:09 agronholm

Hi @agronholm

The compresslevel with zlib is less interesting: For numpy-1.15.2-cp27-cp27mu-manylinux1_x86_64.whl,

  • uncompressed size: 51928 kibibytes
  • wheel available on Pypi: 13510.78 kibibytes
  • rebuilt wheel with compresslevel=9: 13380.84 kibibytes (0.96% better)

I combined the compresslevel parameter with the choice to store files that would result in a bigger compressed file as Stored, not Deflate.

NotSqrt avatar Oct 01 '18 07:10 NotSqrt

Wheels are zipfiles which are compressed per-file, and the zip metadata (filenames) is not compressed. If you were to create the zip file with no compression "store" and then lzma the whole thing you would see better results. Compression algorithms work better on large inputs. Convincing others to accept those wheels would be the tricky part.

dholth avatar Oct 02 '18 20:10 dholth

Yeah, there are bound to be practical difficulties with compression other than zlib.

agronholm avatar Oct 03 '18 06:10 agronholm

I like the idea of improving compression, does it make sense to only use zlib forever, but you'd have to update more tools than just bdist_wheel to pull it off. Another zipfile compression trick is to put one "stored" zipfile inside another one.

dholth avatar Oct 03 '18 12:10 dholth

I've done some groundwork for this in PR #316. Once I am sure that PyPI will reject bzip2/lzma based wheels, I can add support for other compression algorithms as well.

agronholm avatar Oct 25 '19 08:10 agronholm

Cool. It's sortof good and probably helpful for the few wheels that have big individual files.

dholth avatar Oct 27 '19 00:10 dholth

I've done some groundwork for this in PR #316. Once I am sure that PyPI will reject bzip2/lzma based wheels, I can add support for other compression algorithms as well.

@agronholm, will PyPI reject bzip2/lzma? may I know the reason? Thanks.

I am trying to reduce binary size for our wheel, and LZMA shows good potential. It will be great if wheel and pypi can support it.

nbcsm avatar Aug 18 '20 07:08 nbcsm

Those are not supported on Python 2, and thus historically wheels did not support it. Going forward, the plan seems to be to have two layers in the zip where the actual content is xz compressed, making it even more efficient.

agronholm avatar Aug 18 '20 07:08 agronholm

Thanks for the quick response.

But if my package only targets for Python 3, will there be any blocking issue to upload my wheel to PyPI and let user install my package via pip3?

nbcsm avatar Aug 18 '20 07:08 nbcsm

I honestly don't know. I know it's not supported or recommended.

agronholm avatar Aug 18 '20 07:08 agronholm

Got it, thanks.

nbcsm avatar Aug 18 '20 07:08 nbcsm