conan icon indicating copy to clipboard operation
conan copied to clipboard

[feature] Use multithreaded gzip

Open w3sip opened this issue 4 years ago • 5 comments

Compression is excruciatingly slow for large packages. This is mostly due to the fact, that Python gzip package is single-threaded. So no matter how fast of a build machine is being used, when it comes to compressing a large package, it's being wasted,

Switching conan to mgzip should be fairly quick, and performance gains it promises are significant. If it doesn't pan out, switching to an alternative compression method, such as lz4, will be a far bigger change, but may be worth looking at.

w3sip avatar Sep 13 '20 06:09 w3sip

Hi @w3sip

This has been requested (lzma) before: https://github.com/conan-io/conan/issues/648

The thing is that most likely, compressor should be optimized for de-compression speed. It seems that gzip is still the best, it compress less, but it is faster, and we have found that unzipping things is one bottleneck. The idea is that one package is zipped only once, but it will be very likely unzipped many times.

We will be having a look at this while designing Conan 2.0, if there is something that can be reasonably done, yes, we would like to speed that up.

One of the things that it is almost undoable is to add a dependency that is not bundled as a python package and works robustly across platforms. Conan runs in many different platforms and architectures, so using native utilities is a no-go, because it is a nightmare to make it work. Do you have any suggestion of any python package that can do such zipping and unzipping multithread and robustly?

memsharded avatar Sep 13 '20 20:09 memsharded

Well -- I don't have a first hand experience with mgzip (https://pypi.org/project/mgzip/) should do just that -- it's a drop in replacement for gzip, while advertising performance gain. Don't have a direct experience with the package, but it sounds like something that won't be too too hard to test and adopt, if suitable. If should address all the points you've made about gzip compatibility as well. lzma would be cool, but I can understand why it's a much bigger (and, potentially, unsuitable) undertaking.

w3sip avatar Sep 13 '20 21:09 w3sip

https://pypi.org/project/mgzip/ stats:

  • 5 stars in github
  • latest release 0.2 in March (> 6 months ago, also latest commit)
  • 3k download/month (https://pypistats.org/packages/mgzip)

Seems it is not ready for production. Also, looks very interesting, if this was a bit more maintained and stable, it could be very useful, so maybe good enough to experiment with it a bit (if it is a drop in replacement, could be something that could be opt-in by configuration?)

memsharded avatar Sep 13 '20 22:09 memsharded

This seems more promising:

https://github.com/pgzip/pgzip

This issue is causing problems for us, in some projects it takes 4 minutes to compress a single package with PDBs. We could mitigate it with CONAN_COMPRESSION_LEVEL=6 but again, PDBs are huge, can be compressed well, but CONAN_COMPRESSION_LEVEL isn't configurable (AFAIK) on a per-package basis.

dobragab avatar Aug 09 '22 19:08 dobragab

I understand the case, but I am still afraid that https://github.com/pgzip/pgzip is still far from being usable in production by Conan. The project should be more stable, with PyPI packages, with a reasonable release and maintenance history.

memsharded avatar Aug 09 '22 23:08 memsharded

Thanks everyone for your suggestions and input. While this is an improvement worth having, it's one that would take quite a bit of effort to implement (If it was any easy, I supposed that Python would already implement this functionality! And as @memsharded mentioned, the available packages do not seem to be a viable option for production just yet) so after considering it we're postponing looking further into this to 2.X, for when more pressing things are dealt with :)

AbrilRBS avatar Jan 02 '23 13:01 AbrilRBS

We implemented a workaround in the pre_upload hook, we basiclaly dynamically set the CONAN_COMPRESSION_LEVEL based on the size of the package folder.

Also looking forward to using the metadata feature in Conan so we can publish the pdb's there instead.

stackfun avatar Sep 20 '23 21:09 stackfun