gitops-engine icon indicating copy to clipboard operation
gitops-engine copied to clipboard

feat(syncwaves): use binary tree ordering for sync waves

Open SebastienFelix opened this issue 6 months ago • 14 comments

SebastienFelix avatar Jul 05 '25 12:07 SebastienFelix

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 48.63%. Comparing base (8849c3f) to head (cf27063). Report is 52 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #744      +/-   ##
==========================================
- Coverage   54.26%   48.63%   -5.64%     
==========================================
  Files          64       64              
  Lines        6164     6718     +554     
==========================================
- Hits         3345     3267      -78     
- Misses       2549     3194     +645     
+ Partials      270      257      -13     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jul 05 '25 12:07 codecov[bot]

Is this trying to address the same issue as https://github.com/argoproj/gitops-engine/pull/514 ?

alexec avatar Jul 06 '25 02:07 alexec

@alexec : kind of, I guess! The main goal of this feature is to avoid wasting time by waiting for non-related resources to be synced before proceeding a given resource.

I thought about defining the dependencies relation graph the way you did it but I thought that managing the circular dependencies would be too complicated.

The way I did it in this PR completely avoid this issue and avoids adding more complexity with another logical layer.

I openned an issue for that here #734

My solution would allow the implementation of the feature described here #3517 by syncing the applications in parallel and avoiding a stuck component of one application blocking all the resources having a greater syncwave value.

As an example from the mentionned issue : If the infra-prometheus-operator get stuck during deployment, it would only block the deployment of infra-prometheus-alertRules/grafanaDashboard/thanos

Here is the syncwaves values and the induced dependencies related to the issue #3517 image

WDYT?

P.S : after creating this graph, I realize than the syncwave values would be much more human-readable using base 10 instead of base 2 in my graph. Then the graph would look like that : image

SebastienFelix avatar Jul 06 '25 10:07 SebastienFelix

Hello @alexec, I will gladly take any feedback if you think this is bad implementation or design. This feature would be very nice to have indeed and I am willing to help for it's implementation.

Best regards

SebastienFelix avatar Jul 17 '25 17:07 SebastienFelix

My take is that if I were to rewrite Argo from scratch I’d use DAGs not waves in a heart beat. Attack the problem root and stem. It’s simply not possible for write a faster way to sync. I’m not actively involved much today so I can’t push forward these.

alexec avatar Jul 20 '25 20:07 alexec

I fully agree that a DAG would be the way to go, I think this approach was tried in the PR you mentioned and here it looks like I'm trying to apply bandages on a broken leg here (not saying the current implementation is a broken leg!).

My understanding is that guaranteeing that the dependency graph does not contain cycles wouldn't be straightforward to check, or to pinpoint exactly what would need to be changes since there would be many solutions.

I am just trying to implement a similar feature with a backward compatible code, without doing a global, long and risky refactoring. The way I implemented it pretty simple and guarantees that there couldn't be any cycles.

I'm available to work on this feature for all of us to benefit from it, and don't want to waste anyone's time :)

SebastienFelix avatar Jul 20 '25 20:07 SebastienFelix

I think folks' main complaint about sync waves is the user experience, not the lack of support for sequential steps within parallel waves. (There's also the request for ordering within a broader context than a single app, but that's a much harder nut to crack).

crenshaw-dev avatar Jul 20 '25 21:07 crenshaw-dev

Well, what I understand from the issue that lead to your PR. The need was exactly parallel waves as describes in the graphs I posted.

We, in Amadeus, would also benefit greatly from this feature. In our use case, we would like to be able to sync a bunch of apps in parallel (without one app failing to sync impacting all the others), all these apps being synced after a root app has been fully synced and validated. For now, instead of using sync waves across multiple PaaSes (which seems to be possible), we use an argo-workflow as a trigger, which adds more complexity.

SebastienFelix avatar Jul 20 '25 21:07 SebastienFelix

I think I'd find built-in DAG support with dependsOn within a single app context more attractive than a binary tree. It would solve parallelism and UX struggles.

But I think before bothering with either, I'd really want to interrogate your specific use case in detail. I think most use cases (aside from mutating webhooks) can be solved with retries and eventual consistency.

crenshaw-dev avatar Jul 20 '25 21:07 crenshaw-dev

Sure, we can discuss that outside here as it could probably be smoother with a quick call!

EDIT : I sent you a linkedin request

SebastienFelix avatar Jul 20 '25 21:07 SebastienFelix

The gitops-engine repository is migrating to https://github.com/argoproj/argo-cd.

Can you please update the PR by resolving the conflicts / If it's not relevant anymore, feel free to close it You can always re-open it when the migration is over

ppapapetrou76 avatar Sep 11 '25 08:09 ppapapetrou76

Hello @ppapapetrou76 , conflict has been resolved. Even if it needs a big refactoring, I want to continue to work on this PR.

SebastienFelix avatar Sep 19 '25 19:09 SebastienFelix