[feature] Multithreaded conan cache save
What is your suggestion?
Hi! Currently conan cache save command can be slow if the cache is somewhat large. tgz archives cannot be parallelized, but I think it would be great to support some other archiving format that has a parallel compression algorithm.
Have you read the CONTRIBUTING guide?
- [x] I've read the CONTRIBUTING guide
Hi @PodnimatelPingvinov
Thanks for your feedback.
As a quick hint, have you tried changing the core.gzip:compresslevel conf? It can have an important effect on compression speed (it is a tradeoff between speed and final size).
At this moment, this kind of performance optimization is not planned. There are many other functional features and non-functional like optimizations in other areas of the codebase with more priority, as they are more common or in a more critical path (the conan cache save use cases should be mostly individual packages or defined set of packages in the cache for more frequent operations, and packing the whole cache should be a very extraordinary and unusual operation).
If you or someone else would like to explore this, the implementation of the zipping should be relatively easy to locate:
def save(self, package_list, tgz_path):
global_conf = self.conan_api.config.global_conf
cache = PkgCache(self.conan_api.cache_folder, global_conf)
cache_folder = cache.store # Note, this is not the home, but the actual package cache
out = ConanOutput()
mkdir(os.path.dirname(tgz_path))
name = os.path.basename(tgz_path)
compresslevel = global_conf.get("core.gzip:compresslevel", check_type=int)
with open(tgz_path, "wb") as tgz_handle:
tgz = gzopen_without_timestamps(name, mode="w", fileobj=tgz_handle,
compresslevel=compresslevel)
for ref, ref_bundle in package_list.refs().items():
ref_layout = cache.recipe_layout(ref)
...
Not that for something like this to be considered, it should provide some compelling evidence of time savings. Also, and very important, it shouldn't use any other external Python library, and it should be completely cross-platform, many libraries out there for parallel compressing use system bindings which are a no-go in terms of distributing the Conan application in a portable and robust way.
Thanks for providing a hint, I'll give it a try. My use case is CI pipelines, I tried to use this command to save conan cache and then transfer it between jobs, but save operation took way too much time. An alternative here is to manually archive and then unzip .conan2/p directory, is this a good approach?
My use case is CI pipelines, I tried to use this command to save conan cache and then transfer it between jobs
The recommendation is to use the --list argument to provide a package list of the packages to transfer between jobs, which are usually a few of them, the ones that have been built by the current job, not the whole cache, as that wouldn't be very efficient. Conan has other recommended mechanisms, like using a close server to the agents for downloads, to enable the file download cache, and to make the cache persistent.
An alternative here is to manually archive and then unzip .conan2/p directory, is this a good approach?
No, because Conan maintains a DB of the information of the cache, which is not relocatable, it is not simple files, so this won't work, it is necessary to use the save/restore commands.
At the light of other requests such as https://github.com/conan-io/conan/issues/18255, we are considering the possibility to add a Conan extension plugin that would allow users to define their own compression and decompression routines, which will allow to implement this custom multithreaded conan cache save
That's cool, thanks! Setting compression level to lowest possible helped me a lot (conan cache save -cc core.gzip:compresslevel=1 '*:*'), getting pipelines to run in a reasonable amount of time. It's not the best solution though, because it leads to double compression if you use Gitlab CI (it always compresses caches and you can't disable it). So if the plugin allows no compression at all, that would be even better for this particular usecase.
But I think that 0 is a valid value, that results in only "tarring", but without any compression at all, did you try that?
It indeed works, thanks!
This does not really belong here, however i will leave it here nevertheless: Conan should offer a method to compress files, that used the plugins algorithm. Conan as a unzip method (not sure if it uses the plugin), but it should also offer a zip/compress method.
Why? Uploading metadata is not feasible if not compressed (or at least tar-ed). This needs to be done manually. This metadata should use the same (and best) compression algo.
Note: We got around 1GB of debug symbols and zipping those takes 15min on best python (pre 3.13) zip algo (LZMA=single-threaded), with an about 10 time increase on multi threaded compression algos. I think this is a common use case, so this should be offered to anyone.