Optimize conan install --update
Running conan install -u on my conanfile that links in 10 different packages takes almost 30 seconds when nothing gets updated. My guess is that it does something like:
for p in package:
hash(p)
check_local(p)
check_remote(p)
I think this can probably be optimized so that if it's the hashes that take time: do them in parallel, if it's the calls to remote we probably should extend the API to do fewer calls to check more info.
Why is this needed?
For a release build I wipe my conan dir on the CI to make sure I have the latest and greatest even if the version number hasn't changed. But for local developers I also want to make sure that it's fast to check for updates on all binary packages so that they don't sit and spend a lot of time debugging something already fixed.
Ideally our cmake run could execute conan install -u everytime - that requires the execution to go down from 30 seconds to around 1-2 seconds though.
Let me know if I can do some local profiling.
This is an issue for us too...though it's taking much longer. I was waiting for the logging updates to help diagnose the issue.
The truth is that when we first introduced the update feature after users requests, we didn't think of it as a very common command, indeed not to be run at every single developer conan install.
I have ran with logs against conan.io, with a simple project with 4 dependencies, my findings so far:
- Yes, it takes up to 4-5 seconds to do a check, nothing to update.
- To check for updates, conan actually needs to hit the server, twice for each package (one for the package recipe, and another for the package binary)
- The API calls are for retrieving the remote manifests, used to check against the local
- Some API calls might take 200ms, some up to 500ms. some are faster.
- In my case, then it seems that at least 50-60% are network transfers.
I will try to run some checks locally later, against a conan_server. It seems that this could be a challenging optimization.
Can you outline the steps which occur when you specify --install?
Shouldn't it just be:
- Download the recipe...if it's different than local replace the package
- Else, download the manifest - if it's different than local, replace the package
- Else use the local package
Why do you need the package binary?
To check if something can be improved in Q2, having a look in 1.3
Maybe the correct approach is to extend the Rest API with a new method that does a bulk check of packages to update.
We have been recently discussing this issue with the possibility of introducing a new endpoint on the API v2 of the server.
However, the implementation seems to be complex with additional logic in the server, so we have finally considered a parallel management of the api calls following the similar approach/effort for downloads/uploads.
Any update on this? Is there any plan to optimize update before Conan v2?
We are using Conan 1.45.0 and running install --update does around 700 REST_API_CALL according to Conan logs, this takes several minutes. I tried to enable parallel_download = 8 but it did not speed things up, it actually went slightly slower, so I assume REST API calls are not parallelized.
Hi @cubanpit
No, there are no plans for this for Conan 1.X. And I am not sure to which extent conan 2.0 will help with this. Some things have been optimized, leveraging the "immutability" assumptions when using revisions, but if the dependency graph is large, there will be many api calls to the server, and that might still be slow, as it has not been parallelized or new server APIs have been developed.
Some updated information about your dependency graph size, and maybe even including some traces (activating the TRACE_FILE that records api calls) would be nice to have. We might have a look at this again at Conan 2.X, the scope for 2.0 is already defined and it doesn't include this possible optimization.
Thanks for the quick response. I attach the trace (without parallel download enabled), let me know what is the best way to provide the dependency graph size if needed. conan_trace.log
Thanks for the trace. However, I see in the trace, that a lot of downloads are actually happening. This might not be the same issue, but a different one. This issue is about --update that doesn't necessarily download. It will start with a already populated cache, and the update operation will not update anything, but still require some time. Maybe you are talking about total installation time for a relatively big graph? I see we are talking about 82 packages, with its binaries. This is a lot of transfer and unzipping.
Well, I already ran conan install --update on the same conanfile.py twice before generating that trace, so I was not expecting any download. I just noticed it is downloading the entire packages, even if at the end it confirms that all of them are available in the local cache. Am I missing something?
Well, I already ran conan install --update on the same conanfile.py twice before generating that trace, so I was not expecting any download. I just noticed it is downloading the entire packages, even if at the end it confirms that all of them are available in the local cache. Am I missing something?
Yes, this is not expected, but I couldn't say why. Unless the packages got new versions in the server, in that case, it is the expected behavior, but if the packages are the same, they shouldn't be re-downloaded. There should be something else there. Can you reproduce for example using just a single package? Something like conan install <single-pkg-ref>, then conan install <single-pkg-ref> --update?
I can indeed reproduce it with a single package with almost no dependencies (conan_trace_single.log).
I guess that poorly implemented recipes should not cause this, maybe our conanfile.py is invalidating the package somehow, or the GitLab implementation of the Conan package registry is not working as expected. I am still surprised that the package is indeed found in the cache, but only after a full download.
Could you point to any resource that could help track this down?
Oh, it could be a server side issue, that is returning always a "should update". I'd say that is slightly more probably than something in the conanfile.py
The first thing I would try is to validate if it is a server to download an ArtifactoryCE from our downloads page (it is completely free), run it locally in your computer (I can even run it in Windows just double click in the app/bin/artifactory.bat launcher), and check if it is the same behavior. If not, definitely worth reporting against Gitlab, because it could be a bug on their side.
If not, then sharing your conanfile.py, I could do a quick inspect to see if something could be causing this.
Here are the traces in the two cases: conan_trace_artifactory.log conan_trace_gitlab.log
This seems to confirm a bug in the GitLab implementation unfortunately. Thank you a lot for your support on the way. I wonder if there is any way to work-around this behavior, in the meantime I will submit a bug report to GitLab.
EDIT: Link to GitLab issue: https://gitlab.com/gitlab-org/gitlab/-/issues/366425
Thanks very much @cubanpit for testing and following up. Indeed seems a case on the Gitlab side. You can link the issue here if you report, so we can track it too. I'd possibly recommend keeping your ArtifactoryCE around, at least locally, specially because Conan 2.0 works exclusively with API v2, and Gitlab hasn't implemented it yet, so it will not be possible to test it (even in Conan 1.X with revisions enabled, to be ready for 2.0). And 2.0 is looking fantastic, it will be a great improvement.
I will leave this ticket open, because the original issue of potential optimization of --update is still valid.
Just dropping by to update (heh) of current progress: While nothing has changed speed-wise regarding --update, it now understands patterns to save time when only one/a few recipes need updating, like --update=foo/* --update=bar/*, which might help alleviate some of the slowness in some specific use-cases