argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Rollback aware auto-sync

Open stereobutter opened this issue 2 years ago • 10 comments

Summary

Currently auto-sync and rollback do not play nicely together. To perform a rollback auto-sync has to be disabled first and if reenabled happily will sync again to the commit that was rolled back previously. If argo remembered the commit that got rolled back and excluded it from auto-sync this would improve that situation a lot.

This might be the same or a similar issue as https://github.com/argoproj/argo-cd/issues/5351 but I found the issue description really unclear.

Motivation

What has to happen when a rollback is needed currently entails multiple steps to get an application back on track:

  1. disable auto-sync
  2. perform rollback
  3. revert/fix the offending commit in git
  4. reenable auto-sync

If auto-sync excluded the offending commit this would be reduced to

  1. perform rollback
  2. revert/fix the offending commit in git

Apart from saving a developer a bit of manual effort, where this feature would be really useful is for automating rollbacks. We currently tinker with using a post-sync hook for running some tests against our application and would like to automatically rollback using a fail hook (via the argo api [^1]). When a developer then releases a fix, the application would automatically sync again and everything would be back to normal.

[^1]: it appears this would currently not work without some level of indirection due to https://github.com/argoproj/argo-cd/issues/6494. Maybe one could use a argo-workflow that starts another workflow to do the rollback?

Proposal

Argo would grow a new option that when enabled would cause auto-sync to ignore a commit if it has been rolled back. For the basic use case it would probably suffice to just remember the last commit that was rolled back (compared to keeping a history of commits).

stereobutter avatar Jun 02 '22 14:06 stereobutter

linking https://github.com/argoproj/argo-cd/issues/8686 here as a way to connect the dots of all these requests

alexef avatar Apr 14 '23 16:04 alexef

This behavior surprised me. Right now, rolling back feels unsafe because Argo will happily re-deploy shortly after.

jsirianni avatar Jul 14 '23 15:07 jsirianni

+1

afernandez-01 avatar Jul 14 '23 20:07 afernandez-01

Im really interested how people solve this problem. In example in our stack this is how we currently do:

  1. We have staging, production git repos with argo manifest
  2. We have 2 environments, both managed by single argocd instance
  3. Each service has 2 application in argo (app-staging, app-prod)
  4. staging is auto-sync enabled always and sync from master where Prod is NOT set to auto-sync
  5. each app has "post hook" to run sync to prod. And this is where magic starts:
  6. Post hook runs mini golang cli application which does following: 6.1. Get production argo application manifest and checks it it was rollbacked:
```go
// status.sync.status - tells if application is Sync/OutOfSync
// status.sync.revision - desired state pulled from source (in our case git) automatically by argo
// status.operationState.operation.sync.revision - current deployed revision
// status.history - deployment history

Pseudo code would look like this:

```go
currentRev := resp.Status.OperationState.Operation.Sync.Revision // Current revision based on operation.
desiredRev := resp.Status.Sync.Revision
status := resp.Status.Sync.Status

if status == ApplicationSyncStatusSynced && currentRev == newRevision {
	return backoff.Permanent(fmt.Errorf("application %s is already synced to revision %s", appName, newRevision))
}

var latest int
if status == ApplicationSyncStatusOutOfSync {
	for _, item := range resp.Status.History {
	       if item.ID > latest {
			latest = item.ID
		}
	}

	for _, item := range resp.Status.History {
		if currentRev == item.Revision &&
			desiredRev != currentRev &&
			item.ID != latest {
				return backoff.Permanent(fmt.Errorf("application %s rolled back to revision %s, you should sync it manually", appName, currentRev))
				}
			}
		}

		if status != ApplicationSyncStatusOutOfSync {
			return fmt.Errorf("waiting for app %s to appear out-of-sync, current_revision=%s current_status=%s", appName, currentRev, status)
		}
		return nil
	})

This way we "try" (I say try as at this point its best we were able to do, so suggestion are welcome" ) to be more deterministic and not to sync over rollbacked production env.

mjudeikis avatar Aug 23 '23 10:08 mjudeikis

Is there any update on this topic? We have some argo workflows where we need this improvement. If someone solved it already, I will be really appreciated.

Our workflow: Run argo cd sync Run some tests after argo cd sync as post sync hook After this step we want to have auto rollback where argo cd do this automatically if post sync fails.

simply: Deploy-Post-sync if fails rollback if pass do nothing.

selcuktemizsoy avatar Oct 02 '23 15:10 selcuktemizsoy

Would also like to know why these two tools do not play well together on rollback..... when you need them to.

Kampe avatar Oct 04 '23 13:10 Kampe

Rollback button should be disabled (History should be view only mode) if auto-sync is enabled

chinkung avatar Mar 21 '24 08:03 chinkung

@chinkung With the current behavior, yes. Having the rollback button enabled can confusing. But if Git reconciliation is implemented for the rollback, then it should be enabled all the time. They are asking to implement the reconciliation in Git automatically, which makes sense tbh. Just because it is GitOps doesn't mean that there can't be nice UI features to change the source of truth.

AndresPinerosZen avatar Apr 25 '24 18:04 AndresPinerosZen

This issue affects us too. Users have to go through multiple steps, including creating PRs to disable auto-sync, just to perform a successful rollback without overwriting changes.

jluque0101 avatar Apr 26 '24 11:04 jluque0101

Hi Team,

Can we provide a mechanism where certain parameters are not controlled by Yaml auto-sync and are only controlled by the UI?

For example, we can add an UI control only parameter syncPolicy.uiControl.blockAutomatedSync (default false).

Assuming that our service is enabled with syncPolicy, the final configuration would be:

syncPolicy:
    uiControl:
      blockAutomatedSync: false
    automated:
      prune: true
      selfHeal: true

The user story would be as follows:

    1. When the user clicks on rollback in the UI, blockAutomatedSync is automatically set to true.
    1. Rollback to the specified version.
    1. Periodic/manual sync is triggered, and the syncPolicy mechanism detects the difference, but since blockAutomatedSync is true, it blocks sync.
    1. After the bug is fixed and the latest code is committed, it will still not sync.
    1. The user clicks "unblock automatedSync" in the UI and sets blockAutomatedSync to false.
    1. Periodic/manual sync is triggered and the latest version is deployed.

Is this feasible, or does anyone have concerns or better suggestions?

Thanks!

StyleTang avatar May 15 '24 03:05 StyleTang

Here for the notifications

billyatroadie avatar Sep 13 '24 11:09 billyatroadie