feat(appset): Add a cache layer for Argo Projects to speed-up application validation
In our Argo setup, we have several applications sets, and some of them can generate up to 2000 applications.
For those applications sets, reconcile loop can take until 2 or 3 minutes. After some investigation and debug, we found that in validateGeneratedApplications a call is done for each application to verify if the related AppProject exists.
Once we implemented a cache on AppProject, reconcile loop for appset with more 2K apps drops to ~15-20 seconds.
Number of call to k8s API, before/after using a dedicated image of the appset controller
Checklist:
- [ ] Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
- [ ] The title of the PR states what changed and the related issues number (used for the release note).
- [x] The title of the PR conforms to the Toolchain Guide
- [ ] I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
- [ ] I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
- [ ] Does this PR require documentation updates?
- [ ] I've updated documentation as required by this PR.
- [x] I have signed off all my commits as required by DCO
- [x] I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
- [ ] My build is green (troubleshooting builds).
- [ ] My new feature complies with the feature status guidelines.
- [ ] I have added a brief description of why this PR is necessary and/or what this PR solves.
- [ ] Optional. My organization is added to USERS.md.
- [ ] Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).
:x: Preview Environment deleted from Bunnyshell
Available commands (reply to this comment):
- :rocket:
/bns:deployto deploy the environment
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 55.88%. Comparing base (
fa54ce2) to head (add382e). Report is 432 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #18703 +/- ##
==========================================
+ Coverage 55.84% 55.88% +0.03%
==========================================
Files 321 321
Lines 44497 44497
==========================================
+ Hits 24851 24865 +14
+ Misses 17075 17060 -15
- Partials 2571 2572 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
In fact, regarding PRs like these that improve performance, how can we ensure that such modifications are really effective? Or how much improvement has been made?
Hello,
Thanks for your feedback, to be honest I created this PR a little too early :smile:
We are starting today to run this patch-set in our production to have more accurate data, but sounds promising. On my laptop, for an ApplicationSet who generates 2K apps, I drop from 2mins to ~20secs. I went from 2K calls to K8s API to something like 15 (we have lots of apps spread in few projects.)
I will back in few days, we more data.
Hello,
Thanks for your feedback, to be honest I created this PR a little too early 😄
We are starting today to run this patch-set in our production to have more accurate data, but sounds promising. On my laptop, for an ApplicationSet who generates 2K apps, I drop from 2mins to ~20secs. I went from 2K calls to K8s API to something like 15 (we have lots of apps spread in few projects.)
I will back in few days, we more data.
In fact, I am also thinking about whether I should make a benchmark for this project so that the subsequent performance improvements will at least have a baseline.
@crenshaw-dev any idea for this ?
Hi,
I rebase my branch to be up-to-date and to continue to work on it, and now each time appset controller is creating/updating an application, I got this message informer is not a kubernetes informer coming from updateCache function, but I don't why ?
(same error with or without my change)
Note : in our production, we still continue to use the initial implementation of this PR with a dedicated cache layer.
I switch to use informer and cache capability offered by internal Client attached to the appset controller.
Hi, I'm back from holiday and ready to continue to work on this PR :smile:
@crenshaw-dev : PR is up-to-date with master branch, if you have time to have look :pray: Thanks
Hello, Any informations why this one is not embed with the 2.13 release ?
Hello, Any informations why this one is not embed with the 2.13 release ?
Hello, I'm not part of release process nor Argo maintainer, but I imagine my PR was merged in master after release-2.13 branch was started.
But @crenshaw-dev have more clue and can answer :smiley: