argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Add a Resource phase that runs before all other phases

Open endophage opened this issue 1 year ago • 25 comments

Summary

There needs to be a way to create specified resources before anything else happens, i.e. before PreSync hooks run. If a PreSync hooks is to do something useful like run a database migration or notify an external service it likely needs some configuration that under the present model, won't get created or updated until the Sync phase. This causes PreSync hooks to fail and the operator is required to manually sync the specific required resources before running a full sync again. In an automated CI/CD pipeline, this creates pain points when say, database credentials get rotated.

Motivation

I discussed the bulk of my use case here: https://github.com/argoproj/argo-cd/discussions/9801

Laying it out again:

  • PreSync hooks aren't fit for running most tasks they might be used for because they run before secrets, configmaps and service accounts have been created/updated and therefore lack the configuration and permissions to do anything particularly useful.
  • There should be a way to create resources before PreSync hooks run.
  • This does not need to be automatic or do any smart detection, just give users another phase they can use to manually annotate resources that should be created/updated before anything else happens.

Proposal

  • Add another phase "Resources".
  • Ideally it would only be usable on static resources like Service Accounts, Config Maps, Secrets, etc... however I'm not sure how it would then take CRDs into account. For example I'm using aws-secret-operator (which I found via the ArgoCD docs and it's great) to load secrets from AWS into k8s Secret objects. I'd want my CRD AWSSecret resources to get updated during the "Resources" phase.
  • No smart detection is required. Having the phase defined simply enables engineers to annotate resources we know PreSync hooks will require and thereby ensure they're created/updated and available for the PreSync hooks.

endophage avatar Jul 06 '22 17:07 endophage

It's not a bad idea.

Another approach is to build more into your services. For example, with secrets using an init-container is probably best practice at this point because it's the most secure option. The nice thing about init-containers is that logic remains tied to the app itself rather than becoming some kind of meta object to manage.

todaywasawesome avatar Jul 06 '22 17:07 todaywasawesome

@todaywasawesome thanks for the reply. I'm not sure what you're suggesting with init containers. Our application is defined in a helm chart which Argo uses to render the k8s manifests so our secrets are already well managed alongside the app definition.

Possibly you're not familiar with aws-secret-operator, it takes the AWS secret name and the version ID in the form of an AWSSecret CRD and from that creates a k8s Secret containing the data from the AWS secret. That keeps everything sensitive out of our helm charts.

endophage avatar Jul 06 '22 23:07 endophage

@endophage, you can make any resource a Helm hook. When running multiple Helm hooks of the same priority, the normal resource ordering applies. So if you give your ConfigMaps/Secrets the same hook annotations as the Jobs that need them, those should be deployed first, then the Job(s), then all the non-hook resources.

If you don't trust the default resource ordering, you can even set the ConfigMaps to a higher hook priority, so they are deployed in an earlier round of hooks!

This is what I do in my app, and it works a treat!

scalen avatar Jul 12 '22 21:07 scalen

@scalen you said "Helm" hooks. Did you actually mean that and you're relying on ArgoCD's translation of Helm hooks to equivalent ArgoCD phases? Additionally, what happens on a re-deploy (I can and will test this but if you happen to have the info it would be great to have another point of reference/confirmation)? One of my concerns was ArgoCD would execute cleanup code and delete the configmaps/serviceaccounts/secrets. A misconfiguration could potentially then cause them to not get re-created (and really, deleting them even momentarily isn't acceptable to me), breaking the existing pods. Ideally I'd want to see it do a simple apply to update the existing resources.

endophage avatar Jul 14 '22 18:07 endophage

Yes, I use Helm hooks and let ArgoCD translate them to ArgoCD hooks, so that the chart remains portable.

I have always specified hook-based ConfigMaps and secrets for my hook-based jobs, and separate ConfigMaps and Secrets in/for the "main" set of resources, so haven't seen how the main application behaves when relying on hook-based data resources...

However, I have never seen a "delete-before-creation" hook fail to recreate after the last copy was deleted. Furthermore, ConfigMaps and Secrets deploy so quickly that they would not be missed during their brief absence. Finally, in the case there is a race there (which could only occur in the case of inopportune application scaling), a pod will simply fail to schedule until the CM or Secret it relies upon is present, at which point it will swiftly deploy. Deleting a data resource will not affect running pods, as the data is mounted in at pod startup.

If you really want a multi-stage update with resource dependencies between different stages, you probably want to incorporate something like Argo Workflows, though I am not very familiar with that project.

scalen avatar Jul 14 '22 22:07 scalen

@scalen Thanks for all the extra info. I've been looking at workflows, mainly for the canary and blue/green deployments.

In the discussion (linked in my initial issue) it was suggested I could move my PreSync hooks to Sync hooks then use sync-waves. That sort of worked. I set the config maps, secrets, etc... to wave -1 (as shown in the docs). That worked for creating the resources, but I saw my deployments, that had no sync-wave set, being updated in parallel with the Sync hooks.

I've now set sync-waves on the Deployments and they correctly wait for the hooks to complete.

It seems like there are a few bugs with hooks and sync-waves, and I think my initial proposal still stands. PreSync hooks aren't particularly useful if you can't deploy configmaps, secrets, service accounts, etc... before the hooks run.

endophage avatar Jul 15 '22 20:07 endophage

@endophage I'm a little confused by what your use-case might be... You seem to be suggesting setting Deployments as presync hooks? That makes little sense in the first place, as presync hooks are meant to be workloads that complete, i.e. jobs or pods (and the static resources, like ConfigMaps, on which such workloads rely). Any resources other than jobs and pods will get "deployed" at the same time, where "deploying" is simply applying the manifest and getting a success response: a Deployment as a hook would not be waited on until it is healthy, or even have any pods scheduled, before moving to the next priority or main release.

You also talk about Sync Hooks, as well as Sync Waves. These are not the same thing:

  • Sync Hooks have the same semantics as PreSync Hooks, but start at the same time that the Sync itself starts (not before, even with lower priority than 0), and happen in parallel to the Sync; they can therefore not prevent any part of the Sync, unlike PreSync hooks, because they do not happen strictly before any part of the Sync. You have been describing this as a bug, but it is the expected behaviour.
  • Sync Waves control the order of release within a Sync itself (sorted by Sync Wave, then resource kind), but do not otherwise affect how resources are monitored and updated during Syncs (unlike hooks, which are untracked, and always recreated instead of updated). As with the default ordering of resources in a release (by kind only), if an earlier category fails, later categories won't be attempted.

If you have Deployments that need to start and become healthy strictly before other Deployments (or other resources that would normally be installed/updated before Deployments), or resources that should never be deleted, but need deploying outside of the normal order, then yes, Sync Waves are the more useful feature. If I remember rightly, they work better if you categorise everything into explicit waves, because the default isn't necessarily intuitive, i.e. I don't believe it is the same semantics as hooks and their prioritisation: Waves are not a way of doing things "around" releases, but rather a way of being explicit about the deploy order within a release, in cases where the default deploy order isn't sufficient.

scalen avatar Jul 17 '22 23:07 scalen

@scalen apologies if I've been unclear. I do not want to run Deployments in a PreSync hook, and I understand the difference between Phases, Hooks, and Waves.

Boiling it down to the simplest example, I have an ArgoCD application thats made up of a Deployment, an AWSSecret (DB credentials), a ServiceAccount, and 2 tasks (updating a CDN, and running database migrations) that need to run before the Deployment.

The tasks rely on the AWSSecret (to run the DB migrations) and the ServiceAccount (has IAM integration to access an S3 bucket for the CloudFront CDN). So, the order of operations for the deployment absolutely must be:

  1. Deploy ServiceAccount and AWSSecret
  2. Run tasks
  3. Deploy Deployment

From reading the ArgoCD docs, PreSync tasks are explicitly mentioned as an appropriate place to run database migrations. However, there's no way to create/update the AWSSecret before running a PreSync task. Similarly for the CDN update task, I need the ServiceAccount so that the task has the IAM permissions to access the S3 bucket. I can't imagine I'm unique in these types of requirements and as such, PreSync tasks seem pretty useless for anything that requires credentials.

So, I can't create the resources (AWSSecret and ServiceAccount) before a PreSync hook, but if I move the tasks to be Sync hooks, now everything is happening in the Sync Phase and I can use explicit Sync Waves to strictly order the operations. The docs note:

Hooks and resources are assigned to wave zero by default. The wave can be negative, so you can create a wave that runs before all other resources.

So, I set the resources to wave -1 to be explicit that they must run before the Sync hooks. While that quote suggests to me Deployment will run at some wave later than 0, that's not what I saw, so I set an explicit wave of 10 for the Deployment and now it correctly waits for the Sync hooks to complete before deploying.

endophage avatar Jul 26 '22 17:07 endophage

Thanks for clarifying, and it sounds like you found the solution. I think I would personally have tried making the Deployment part of Wave 1, rather than jump all the way to 10, or made the config resources Wave -2 and the Sync hooks -1, but that is personal preference.

Using Waves, you have basically created for yourself what you requested in the OP. I assume that means that this Issue is closed?

scalen avatar Jul 27 '22 05:07 scalen

At minimum I'd suggest there's a documentation update that needs to be made. It definitely sent me down the wrong path by suggesting PreSync hooks are where to run DB migrations.

I still think a better solution would be to have the "Resources" phase I proposed at the top, or even simply allow the PreSync phase to to apply resource changes (that's another weird problem imo, as far as configuration, I can only target a Hook, not a Phase).

The fact that ArgoCD can translate Helm hooks to run at appropriate times demonstrates a weird hole in ArgoCD's functionality. If you happen to use Helm, you can make ArgoCD do things it doesn't natively support for a Kustomize or raw manifest user, even though at face value, ArgoCD attempts to implement a similar hooks concept.

endophage avatar Jul 27 '22 17:07 endophage

Oh, the Helm Hooks directly translate into ArgoCD Hooks (actually, you lose some functionality from what Helm supports natively, instead of gaining any in ArgoCD).

What we do to support PreSync hook Jobs that need ConfigMaps and Secrets is have duplicates of those resources as higher priority Hooks, i.e. the ConfigMaps dedicated to the PreSync Jobs are themselves Hooks with hook weight -10 and, the PreSync Jobs have hook weight -5. Then we have mostly duplicate ConfigMaps and Secrets dedicated to the Deployments in the main Sync phase. Duplication in the code base is minimised via templating, so duplication in the cluster is seen as at worst a minor inconvenience.

The point is, there are ways to achieve what you're asking with the tools available. Adding another, more specific form of wave is of questionable value when you can simulate PreSync hooks between waves using Sync hooks in an intermediate wave.

scalen avatar Jul 27 '22 19:07 scalen

I hear you that there's a way to do it. My issue is that there's a huge gap between the docs, which don't address or provide any guidance on the complexities you've highlighted with needing duplicate ConfigMaps/Secrets. Being blunt, it often feels like a lot of projects omit these types of complexities from the docs to make the feature seem better than it is. I don't expect the info to be included in a baseline 101 doc, but it should be in some time of "advanced" section or use case example.

Alternatively, make the feature better, so that it's clearer and easier to make use of it. This is always the right answer, but I also get that there are competing priorities, so the docs update may be the more immediately achievable fix.

endophage avatar Aug 04 '22 17:08 endophage

I agree about documentation, in so much as I fully support a use-case that mixes Sync Waves and Sync Hooks to implement InterWave Hooks: it's a very common use-case to which there is a good (but not immediately obvious) solution that exactly implements your intuitive feature request.

In terms of my solution with duplicated ConfigMaps as PreSync Hooks, I don't think that should be documented explicitly by ArgoCD: it is a worse solution than what their system offers, only necessary because we have chosen to stick strictly within the functionality of Helm. It is therefore up to Helm, if anyone, to document a real-life example of Helm Hook usage.

I'm sure that they would be happy to review a documentation PR, as your proposal really shows the expressive power of Sync Waves 👍🏼

scalen avatar Aug 04 '22 21:08 scalen

Is it still so that there is no real solution to this? Also running migrations in PreSync which depend on a Secret that is created by Vault Operator. Also the Vault Operator depends on the ServiceAccount so i would need to actually be able to create the SA and Vault related resources before running the PreSync to make it work.

joonvena avatar Jun 06 '23 17:06 joonvena

Hi! Do you have any update on this? We are moving to argo and we are facing same issue.

  1. We create database, and other infra resources, for service with crossplane.
  2. we apply db migrations.
  3. service deployment.

our idea was to use wave: -1 to apply infra resources, then run migrations with presync hook, and finally deploy the service. But it seems it's not possible. Do you have any workaround ?

Thanks.

x4b1 avatar Jun 16 '23 12:06 x4b1

Does anyone have any updates on this? I have the exact same situation, but sync-waves with Sync hook dont help and deployment doesn't wait until hook is completed, it just deploys the entire application

TemKaa1337 avatar Oct 14 '23 10:10 TemKaa1337

When people are saying that it isn't possible to establish waves of resources and hooks, could they be more clear what they're trying, and how it doesn't work? Because the system as documented seems to support your stated use cases.

If the system as documented doesn't support your stated use cases, then that is a bug and should probably be reported as such. This issue is marked as an enhancement, and likely one that isn't needed, as the exact functionality described in the OP is a subset of the existing documented functionality. Again, if you are using the system as documented in an effort to produce the requested effect, and find it not working as documented, please file (or add to) a bug issue, not this "enhancement".

scalen avatar Oct 14 '23 10:10 scalen

When people are saying that it isn't possible to establish waves of resources and hooks, could they be more clear what they're trying, and how it doesn't work? Because the system as documented seems to support your stated use cases.

If the system as documented doesn't support your stated use cases, then that is a bug and should probably be reported as such. This issue is marked as an enhancement, and likely one that isn't needed, as the exact functionality described in the OP is a subset of the existing documented functionality. Again, if you are using the system as documented in an effort to produce the requested effect, and find it not working as documented, please file (or add to) a bug issue, not this "enhancement".

The document doesn't support the use case, but I think it doesn't mean it's a bug since that's how Argocd and even helm is designed this way.

PreSync or pre-install/pre-upgrade hook is only used for Configmap, secret, and Job. Argocd is supposed to recreate the resource before the next PreSync circle and/or delete it after it finishes, which is not ideal for a lot of operator CRD.

Let's say I want to have a pre-upgrade hook to migrate the database and make Argocd manage my PostgreSQL operator config too. The PreSync hook runs before any resource is created, which makes it hang forever. I had to manually create the config and also apply if there were any config changes, which destroyed the purpose of a git-managed infrastructure..

There are three ways to achieve this (or should be)

  • Have a way to create resources before PreSync hook (which is this issue)
  • Multi sources order. This way I can create another source folder with a higher priority than our helm chart
  • This is the current achievable way: create another application/applicationset with the resource and it will be created in parallel with my helm chart.

ntcong avatar Oct 16 '23 02:10 ntcong

My apologies, I have just taken another look at the documentation: they seem a lot clearer now than last time I looked! It seems that I had the priority of Sync Waves and Phases backwards (though I thought I remembered testing my original suggestion in this issue...)

If the priority of Sync Waves and Phases were reversed then my suggestion would be possible, as would all usecases supported by the current model: each wave gets a set of pre-sync, sync, post-sync and sync-fail hooks, all of which have to complete in the correct order before the next Wave can deploy, starting with its pre-sync Phase. That would be my recommended enhancement, then, as it doesn't deal in absolutes and involves no new interface elements, just a documentation tweak.

I think it would be functionally backwards compatible, too, assuming people have kept related hooks and manifests in the same waves (despite that not being functionally valuable at the moment 😬).

scalen avatar Oct 16 '23 06:10 scalen

I have similar use case:

  • I wanna run migrations in the PreSync hook, with a Job
  • I use a Secret (generated, for example, with ExternalSecret or SealedSecret) for these migrations
  • but I need to keep the Secret for the applications (eg Deployment or StatefulSet installed after the PreSync hook because it's used to connect to the database

I don't want to put the Secret in the PreSync hook with the delete policy set to BeforeHookCreation because I don't want it to be removed before a new sync (because it's used by the pods of the application).

Edit: for now I use the Sync hook for this migration but it's not ideal because if the job fails, my applications are still up and working.

sambonbonne avatar Jan 24 '24 10:01 sambonbonne

My suggestion would be the following:

  1. Presync hooks continue to work as normal, before any other sync wave
  2. If you add a wave number annotation to a pre-sync hook annotated resource, it will run the pre-sync hook after sync waves that came before it. so it could go sync wave (-1), presync hook (0 explicitly stated), sync wave (0), sync wave (1), etc.

dfsdevops avatar Feb 22 '24 00:02 dfsdevops

I just want to put an example here that might clear up some confusion. The solution suggested by @scalen (duplicating ConfigMaps) is actually very "light" duplication when using kustomize. In my kustomization.yaml I have:

configMapGenerator:
  - name: deployment-env
    envs:
      - deployment.env
  - name: db-migrate-job
    files:
      - db-migrate-job.sh
    # Run this in the pre-sync phase before db-migrate-job
    options:
      annotations:
        argocd.argoproj.io/hook: PreSync
        argocd.argoproj.io/sync-wave: "-10"
  - name: db-migrate-env
    envs:
      - deployment.env
    options:
      annotations:
        argocd.argoproj.io/hook: PreSync
        argocd.argoproj.io/sync-wave: "-10"

So when deployed to k8s, deployment-env and db-migrate-env will have the same content (and different annotations). However, I don't have the content duplicated on my side: kustomize handles that for me.

I actually somewhat prefer having the ConfigMaps used in the PreSync hooks to be different or isolated from the ConfigMaps in the running app. I'm not sure if I can explain this clearly, but in my head it feels "safer" because if there is an issue with the PreSync, then the old/current ConfigMaps used by the Deployment will still be there and will not be updated during the (broken) PreSync phase.

bkanuka avatar Mar 06 '24 12:03 bkanuka

This is a real smell in the design. Seems like a technical solution that was not fully mapped to the use cases.

I guess for now we need to duplicate secrets, cms, etc, across the phases.

barrykaplan avatar Mar 22 '24 14:03 barrykaplan

Its interesting that most (all?) examples of using hooks for work that requires credentials, the credentials are hard coded in the hook manifest. Thereby sidestepping the issue being discussed here.

https://redhat-scholars.github.io/argocd-tutorial/argocd-tutorial/04-syncwaves-hooks.html

barrykaplan avatar Mar 22 '24 14:03 barrykaplan

Example and/or a workaround used in similar scenario where a database migration is supposed to to be ran before any deployment changes:

1: Annotate every prerequisite (for that migration Job) object (i.e. Secret, ConfigMap, ServiceAccount): argocd.argoproj.io/sync-wave: "-2"

2: Annotate actual migration job with the following (Important - hook:Sync ensures that ArgoCD recognizes this Job as a fire-and-forget operation that runs until completion, as it's supposed to be. Otherwise, it will treat the Job as OutOfSync and continue syncing it repeatedly):

argocd.argoproj.io/hook: Sync
argocd.argoproj.io/sync-wave: "-1"

3: Leave any other object un-annotated with hook/sync-wave values and let ArgoCD handle the logic with already existing sequence: https://github.com/argoproj/gitops-engine/blob/bc9ce5764fa306f58cf59199a94f6c968c775a2d/pkg/sync/sync_tasks.go#L27-L66

I have also observed above sequence is only valid for the chart in current context. If the Application in question is multi-sourced (multiple charts), manifests from other charts will be iterated using default sequence

evaldas-kazakas avatar May 02 '24 12:05 evaldas-kazakas