argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

Application in any namespace | Synced with NO resources deployed

Open ironoa opened this issue 3 years ago • 45 comments

Describe the bug

Maybe someone can help me debug an issue I'm facing:

I have an ArgoCD instance deployed with helm, chart version 5.16.2, app version v2.5.4

I'm trying to enable the application-in-any-namespace feature... basically I added the application.namespaces: '*' config as a configs.params value.

When I deploy an Application in a namespace which is not the one where argocd is installed, the application now gets recognized (thanks to the above mentioned config), but it becomes immediately synced without deploying anything else... I'm also not getting any errors anywhere apparently...

FYI

  • I defined a dedicated AppProject with the spec.sourceNamespaces: '*' config.
  • The very same Application deployed in the argocd (my std) namespace produces the expected result instead

Any hint ? thanks

To Reproduce

  • deploy Argocd helm chart, configure the value configs.params with application.namespaces: '*'
  • deploy an AppProject with the spec.sourceNamespaces: '*' config
  • deploy an Application (which uses the Project defined in the previous step) in a namespace which is not the Argo one
  • the Application will get recognized, but no further resources will get deployed

Expected behavior

When I deploy an Application in a namespace which is not the one where argocd is installed, the application now gets recognized (thanks to the above mentioned config), but it becomes immediately synced without deploying anything else (pods, svc, ...)... I'm also not getting any errors anywhere apparently...

The very same Application deployed in the argocd (my std) namespace produces the expected result instead

Screenshots

image

Additional context

I opened an issue also here

ironoa avatar Dec 09 '22 14:12 ironoa

@ironoa Have you try to deploy with

server:
  extraArgs:
    - --application-namespaces="*"

and

controller:
  extraArgs:
    - --application-namespaces="*"

in the helm Chart ?

sommerit avatar Dec 19 '22 17:12 sommerit

application.namespaces should be picked up by by both the server and the controller. Did you restart the components after applying that config?

Aside: try to very, very quickly narrow the *s, especially in the AppProject, to something more specific once things are up and running. 🙂

crenshaw-dev avatar Dec 19 '22 17:12 crenshaw-dev

application.namespaces should be picked up by by both the server and the controller. Did you restart the components after applying that config?

indeed, application.namespaces parameter is taken by both the controller and the server

In particular, so far:

  • I verified that both the server and the controller pods have an environment variable set for ARGOCD_APPLICATION_NAMESPACES (accessed the pods and echo)
  • I set application.namespaces to *
  • I set spec.sourceNamespaces to * among other fields in the AppProject:
    • image
  • I tried to set application.namespaces and spec.sourceNamespaces to a specific namespace (another test) rather than *
  • I restarted everything multiple times
  • I tried also to edit the argocd-server cluster role to make it similar to the namespaced argocd-server role (maybe I have to do something smarter here?)

The outcome is always:

  • the App is detected, either I create it via App of App (root in argocd, app in test-namespace) or manually deploy the app in test-namespace

  • the manifest is detected

    • image
  • everything looks healthy, nothing gets deployed

    • image
  • same app, deployed in argocd namespace works as expected

    • image
  • PS if ARGOCD_APPLICATION_NAMESPACES envs are not properly set the app doesn't even get detected, tested, as expected

The biggest problem is the fact I haven't been able to spot an error message anywhere so far => Any ideas on how to debug this ?

ironoa avatar Dec 21 '22 22:12 ironoa

Was able to make v2.5.5 work by adding the application.namespaces to the argocd-cmd-params-cm.

I had no other choices then to kill the pods to make them reload the config. Might look into rolling in the https://github.com/stakater/Reloader soon.

Hope that helps!

Alegrowin avatar Dec 21 '22 22:12 Alegrowin

@ironoa Have you try to deploy with

server:
  extraArgs:
    - --application-namespaces="*"

and

controller:
  extraArgs:
    - --application-namespaces="*"

in the helm Chart ?

just to be meticulous, tried also that: not working

ironoa avatar Dec 21 '22 22:12 ironoa

Using kustomize would look somehow like this:

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
  - github.com/argoproj/argo-cd//manifests/ha/cluster-install?ref=v2.5.5

configMapGenerator:
  - behavior: merge
    literals:
      - server.insecure="true"
      - application.namespaces="*"
    name: argocd-cmd-params-cm
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: test
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  clusterResourceWhitelist:
    - group: '*'
      kind: '*'
  destinations:
    - namespace: 'test-dev'
      server: 'https://kubernetes.default.svc'
    - namespace: 'test-uat'
      server: 'https://kubernetes.default.svc'
    - namespace: 'test-prd'
      server: 'https://kubernetes.default.svc'
  sourceRepos:
    - '*'
  sourceNamespaces:
  - test-*

Alegrowin avatar Dec 21 '22 22:12 Alegrowin

@ironoa did you remove your pods after changing the configmap?

If not, try this

kubectl scale statefulset/argocd-application-controller --replicas=0  -n argocd 
kubectl scale deployment/argocd-server --replicas=0  -n argocd 

kubectl scale statefulset/argocd-application-controller --replicas=1  -n argocd 
kubectl scale deployment/argocd-server --replicas=1  -n argocd 

Alegrowin avatar Dec 21 '22 22:12 Alegrowin

Was able to make v2.5.5 work by adding the application.namespaces to the argocd-cmd-params-cm.

I had no other choices then to kill the pods to make them reload the config. Might look into rolling in the https://github.com/stakater/Reloader soon.

Hope that helps!

not sure what you mean, I just restarted every pod with kubectl delete --all pods in the argocd namespace, no luck

image

btw thanks for the support, I really don't know how to debug this (yet)

ironoa avatar Dec 21 '22 22:12 ironoa

Can you run kubectl get configmap argocd-cmd-params-cm -n argocd -o yaml

and validate it contains

data:
  application.namespaces: '*'

Alegrowin avatar Dec 21 '22 22:12 Alegrowin

@ironoa did you remove your pods after changing the configmap?

If not, try this

kubectl scale statefulset/argocd-application-controller --replicas=0  -n argocd 
kubectl scale deployment/argocd-server --replicas=0  -n argocd 

kubectl scale statefulset/argocd-application-controller --replicas=1  -n argocd 
kubectl scale deployment/argocd-server --replicas=1  -n argocd 

Tried just to be meticulous, but as I already said here, one of the first things I tried to do was accessing the pods (exec) and asses by echoing that the ARGOCD_APPLICATION_NAMESPACES env variable is set for both the server and the controller

ironoa avatar Dec 21 '22 22:12 ironoa

Maybe upgrading helm chart 5.16.9, which uses v2.5.5 might help?

Alegrowin avatar Dec 21 '22 22:12 Alegrowin

Can you run kubectl get configmap argocd-cmd-params-cm -n argocd -o yaml

and validate it contains

data:
  application.namespaces: '*'

kubectl get configmap argocd-cmd-params-cm -n argocd -o yaml | grep application.namespaces

image

ironoa avatar Dec 21 '22 22:12 ironoa

Maybe upgrading helm chart 5.16.9, which uses v2.5.5 might help?

I'm there already, both with the chart and the argo version

image

ironoa avatar Dec 21 '22 22:12 ironoa

@Alegrowin out of this scenario, what are possible causes for an app being present, considered healthy, but nothing deployed underneath ?

So far it happened to me when:

  • the custom values of a plugin were misconfigured ( wrong tabulation, etc... ) => but that lead to the manifest being empty and the repo server sidecars issuing errors... FYI weirdly enough from the UI the app looked healthy (with nothing deployed, or warning you that syncing would have deleted everything)

In this case the manifest is there and looks as it should be, and indeed the very same app deployed in argocd namespace works

ironoa avatar Dec 21 '22 22:12 ironoa

@Alegrowin I've just spot this 2 errors:

"time=\"2022-12-21T22:29:47Z\" level=error msg=\"Unable to create audit event: events is forbidden: User \\\"system:serviceaccount:argocd:argocd-server\\\" cannot create resource \\\"events\\\" in API group \\\"\\\" in the namespace \\\"test-argo-namespace\\\"\" application=test-argo-namespace dest-namespace=test-argo-namespace dest-server=\"https://kubernetes.default.svc\" reason=ResourceDeleted type=Normal\n"
"time=\"2022-12-21T22:29:47Z\" level=error msg=\"finished streaming call with code Unknown\" error=\"error getting app resource tree: cache: key is missing\" grpc.code=Unknown grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time=\"2022-12-21T22:29:35Z\" grpc.time_ms=12039.218 span.kind=server system=grpc\n"

both coming from the server component... nothing else other than that matches level=error PS test-argo-namespace is exactly the namespace I'm using to test/debug this issue (referred in the first error log)

ironoa avatar Dec 22 '22 07:12 ironoa

@ironoa I was able to reproduce, here something that might help

Using kustomize, I basically patched the argocd-server ClusterRole resource to look alike the argocd-server Role since it is now managing resources in other namespaces as well.

Since you are using helm, the template will need to change (feature flag maybe?) so the sa/argocd-server can update resource in other namespace.

This ClusterRole needs to look like this [Role}(https://artifacthub.io/packages/helm/argo/argo-cd?modal=template&template=argocd-server/role.yaml)

Alegrowin avatar Dec 22 '22 18:12 Alegrowin

And also this in case you are using kustomize.

@crenshaw-dev

Alegrowin avatar Dec 22 '22 18:12 Alegrowin

@ironoa I was able to reproduce, here something that might help

Using kustomize, I basically patched the argocd-server ClusterRole resource to look alike the argocd-server Role since it is now managing resources in other namespaces as well.

FYI If you are right then this commit/pr has never been enough and needs to be extended...

Still something is missing... I also tried to restart all the pods and recreate the application but it doesn't work... the Application looks healthy but it doesn't deploy anything...

LAST SEEN   TYPE     REASON            OBJECT                             MESSAGE
59m         Normal   ResourceUpdated   application/test-argo-namespace    Updated sync status:  -> Synced
59m         Normal   ResourceUpdated   application/test-argo-namespace    Updated health status:  -> Healthy

Here my role and (modified) cluster role description:

#role
PolicyRule:
  Resources                    Non-Resource URLs  Resource Names  Verbs
  ---------                    -----------------  --------------  -----
  applications.argoproj.io     []                 []              [create get list watch update delete patch]
  applicationsets.argoproj.io  []                 []              [create get list watch update delete patch]
  appprojects.argoproj.io      []                 []              [create get list watch update delete patch]
  configmaps                   []                 []              [create get list watch update patch delete]
  secrets                      []                 []              [create get list watch update patch delete]
  events                       []                 []              [create list]
#cluster role
PolicyRule:
  Resources                    Non-Resource URLs  Resource Names  Verbs
  ---------                    -----------------  --------------  -----
  applications.argoproj.io     []                 []              [create get list watch update delete patch]
  applicationsets.argoproj.io  []                 []              [create get list watch update delete patch]
  appprojects.argoproj.io      []                 []              [create get list watch update delete patch]
  configmaps                   []                 []              [create get list watch update patch delete]
  secrets                      []                 []              [create get list watch update patch delete]
  events                       []                 []              [create list]

am I missing something ?

ironoa avatar Dec 23 '22 09:12 ironoa

"time=\"2022-12-21T22:29:47Z\" level=error msg=\"finished streaming call with code Unknown\" error=\"error getting app resource tree: cache: key is missing\" grpc.code=Unknown grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time=\"2022-12-21T22:29:35Z\" grpc.time_ms=12039.218 span.kind=server system=grpc\n"

=> https://github.com/argoproj/argo-cd/blob/master/server/application/application.go#L1383

ironoa avatar Dec 23 '22 09:12 ironoa

Something similar happens in my Argo-CD setup. I'm using AVP to replace placeholder in Secrets with values from Vault. AVP has a discover command to find Helm charts in which it could replace placeholders. When a Helm Chart is detected in ArgoCD Application, ArgoCD syncs the application with no resources. image

lusu007 avatar Dec 28 '22 15:12 lusu007

Hi folks, sorry for chiming in so late.

Have you set the resource tracking method to annotation or annotation+label? I believe what you see is the following:

  • Two applications with a similar name, but deployed to different namespaces
  • One of them (in argocd namespace) is synced and has resources
  • The other (in different namespace) sees the resources from the other application (by label tracking), but those are not permitted in the app's AppProject and thus not displayed

This has not been properly documented in 2.5, but will be in 2.6:

https://argo-cd.readthedocs.io/en/latest/operator-manual/app-any-namespace/#switch-resource-tracking-method

jannfis avatar Dec 28 '22 15:12 jannfis

Hey @jannfis, thank you for your response. I had not changed the setting until now. But after changing it to annotation+label I, unfortunately, see no difference in ArgoCD's behavior.

I set CreateNamespace=true in my setup and the namespaces are also not created.

lusu007 avatar Dec 28 '22 15:12 lusu007

Hi @jannfis, thanks for your answer.

To clarify:

* Two applications with a similar name, but deployed to different namespaces

This assumption is not matching my case: I have just one Application, with an unique name, deployed in an unique namespace (tested also with a namespace named "t", very short). Anyway, so far carried multiple tests with resource tracking method set to: annotation, annotation+label, label(default) after your advice. Unfortunately that was not the solution.

FYI I've been trying to be very meticulous here. What I've done as a complementary test (only one app at a time) was taking the very same Application and deploying it in the argocd namespace, to asses that the problem is not the Application definition but just the fact it is deployed in a custom namespace. Full test description here

Works image Doesn't work: detected, synced and healthy with no resources deployed image

ironoa avatar Dec 30 '22 22:12 ironoa

Something similar happens in my Argo-CD setup. I'm using AVP to replace placeholder in Secrets with values from Vault. AVP has a discover command to find Helm charts in which it could replace placeholders. When a Helm Chart is detected in ArgoCD Application, ArgoCD syncs the application with no resources. image

Hey @lusu007 not sure this is related although the outcome is very similar. Are you actually deploying the Application in a namespace which is not the default one (i.e. argocd?)

I'm using AVP as well and I faced the same issue you mentioned (again, probably your issue is not correlated to the one of this issue if I'm right). To DEBUG: If you are using the sidecar approach try to log one of the sidecar pods, you might find an error there. If you are using the configmap approach, try to log the repo-server pod. FYI I had errors in there (wrong identation in the plugin inline values, wrong vault setup, ...), and the outcome was indeed an healthy and synced status (error handling with plugins to be improved imho). I actually mentioned that here above

Just to clarify, for this Application I'm obviously not using any plugin, I'm focused only on the application-in-any-namespace feature. The main problem here is that I don't see any errors anywhere so I cannot really debug the issue

ironoa avatar Dec 30 '22 22:12 ironoa

Hey @ironoa, thank you for your detailed answers.

I tried to debug AVP further with a colleague. We found out that Helm has read-only permissions to the cache folder. After moving the cache folders with the Helm env variables (HELM_CACHE_HOME, HELM_CONFIG_HOME, HELM_DATA_HOME) to /tmp/helm/... the permissions issue was fixed. After that, all that was left to do was to either add the repositories in the init script or run helm dep update instead of helm dep build.

Now everything works. Thanks a lot for the detailed help!

A working AVP config:

apiVersion: argoproj.io/v1alpha1
kind: ConfigManagementPlugin
metadata:
  name: argocd-vault-plugin-helm
spec:
  # https://argocd-vault-plugin.readthedocs.io/en/stable/usage/#with-helm
  allowConcurrency: true
  discover:
    find:
      command:
        - sh
        - "-c"
        - "find . -name 'Chart.yaml' && find . -name 'values.yaml'"
  init:
    command:
      - sh
      - "-c"
      - "helm dependency update"
  generate:
    command:
      - sh
      - "-c"
      - "helm template $ARGOCD_APP_NAME --include-crds . | argocd-vault-plugin generate -"
  lockRepo: false

lusu007 avatar Jan 03 '23 18:01 lusu007

version v2.5.9, still facing the issue...

ironoa avatar Jan 30 '23 09:01 ironoa

Sorry for chiming in again a little late.

Just a question from the comments I've read so far: Are the affected apps all have manifests generated by a plugin?

jannfis avatar Jan 31 '23 00:01 jannfis

Just to clarify, for this Application I'm obviously not using any plugin, I'm focused only on the https://github.com/argoproj/argo-cd/pull/9755 feature. The main problem here is that I don't see any errors anywhere so I cannot really debug the issue

OK, so probably not :)

jannfis avatar Jan 31 '23 00:01 jannfis

@ironoa Referring to https://github.com/argoproj/argo-cd/issues/11638#issuecomment-1362175890, are there really no resources deployed to the cluster, or are they just not visible from the UI?

jannfis avatar Jan 31 '23 00:01 jannfis

Also, what's weird about the two screenshots, that the one that is not working lacks the sync status field, i.e:

image

vs

image

jannfis avatar Jan 31 '23 01:01 jannfis