conan
conan copied to clipboard
[feature] Use multithreaded gzip
Compression is excruciatingly slow for large packages. This is mostly due to the fact, that Python gzip package is single-threaded. So no matter how fast of a build machine is being used, when it comes to compressing a large package, it's being wasted,
Switching conan to mgzip should be fairly quick, and performance gains it promises are significant. If it doesn't pan out, switching to an alternative compression method, such as lz4, will be a far bigger change, but may be worth looking at.
Hi @w3sip
This has been requested (lzma) before: https://github.com/conan-io/conan/issues/648
The thing is that most likely, compressor should be optimized for de-compression speed. It seems that gzip is still the best, it compress less, but it is faster, and we have found that unzipping things is one bottleneck. The idea is that one package is zipped only once, but it will be very likely unzipped many times.
We will be having a look at this while designing Conan 2.0, if there is something that can be reasonably done, yes, we would like to speed that up.
One of the things that it is almost undoable is to add a dependency that is not bundled as a python package and works robustly across platforms. Conan runs in many different platforms and architectures, so using native utilities is a no-go, because it is a nightmare to make it work. Do you have any suggestion of any python package that can do such zipping and unzipping multithread and robustly?
Well -- I don't have a first hand experience with mgzip (https://pypi.org/project/mgzip/) should do just that -- it's a drop in replacement for gzip, while advertising performance gain. Don't have a direct experience with the package, but it sounds like something that won't be too too hard to test and adopt, if suitable. If should address all the points you've made about gzip compatibility as well. lzma would be cool, but I can understand why it's a much bigger (and, potentially, unsuitable) undertaking.
https://pypi.org/project/mgzip/ stats:
- 5 stars in github
- latest release 0.2 in March (> 6 months ago, also latest commit)
- 3k download/month (https://pypistats.org/packages/mgzip)
Seems it is not ready for production. Also, looks very interesting, if this was a bit more maintained and stable, it could be very useful, so maybe good enough to experiment with it a bit (if it is a drop in replacement, could be something that could be opt-in by configuration?)
This seems more promising:
https://github.com/pgzip/pgzip
This issue is causing problems for us, in some projects it takes 4 minutes to compress a single package with PDBs. We could mitigate it with CONAN_COMPRESSION_LEVEL=6 but again, PDBs are huge, can be compressed well, but CONAN_COMPRESSION_LEVEL isn't configurable (AFAIK) on a per-package basis.
I understand the case, but I am still afraid that https://github.com/pgzip/pgzip is still far from being usable in production by Conan. The project should be more stable, with PyPI packages, with a reasonable release and maintenance history.
Thanks everyone for your suggestions and input. While this is an improvement worth having, it's one that would take quite a bit of effort to implement (If it was any easy, I supposed that Python would already implement this functionality! And as @memsharded mentioned, the available packages do not seem to be a viable option for production just yet) so after considering it we're postponing looking further into this to 2.X, for when more pressing things are dealt with :)
We implemented a workaround in the pre_upload
hook, we basiclaly dynamically set the CONAN_COMPRESSION_LEVEL based on the size of the package folder.
Also looking forward to using the metadata feature in Conan so we can publish the pdb's there instead.