argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

feat: Combine files in a multisource repo (#12471 #12485)

Open gczuczy opened this issue 2 years ago • 55 comments

Fixes #12471 Fixes #12476 Fixes #7189 Fixes #13220 Fixes #14521

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • [x] Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • [x] The title of the PR states what changed and the related issues number (used for the release note).
  • [x] I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • [ ] I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • [x] Does this PR require documentation updates?
  • [x] I've updated documentation as required by this PR.
  • [x] Optional. My organization is added to USERS.md.
  • [x] I have signed off all my commits as required by DCO
  • [x] I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • [x] My build is green (troubleshooting builds).

gczuczy avatar Feb 17 '23 10:02 gczuczy

Codecov Report

Attention: Patch coverage is 46.07330% with 206 lines in your changes are missing coverage. Please review.

Project coverage is 49.46%. Comparing base (2b6b9bf) to head (1b8cb9f). Report is 6 commits behind head on master.

:exclamation: Current head 1b8cb9f differs from pull request most recent head acf443f. Consider uploading reports for the commit acf443f to get more accurate results

Files Patch % Lines
reposerver/repository/repository.go 40.65% 156 Missing and 25 partials :warning:
util/app/path/path.go 67.92% 13 Missing and 4 partials :warning:
util/argo/argo.go 72.72% 4 Missing and 2 partials :warning:
pkg/apis/application/v1alpha1/types.go 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12508      +/-   ##
==========================================
+ Coverage   49.45%   49.46%   +0.01%     
==========================================
  Files         273      271       -2     
  Lines       48662    48140     -522     
==========================================
- Hits        24066    23813     -253     
+ Misses      22235    21960     -275     
- Partials     2361     2367       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 17 '23 10:02 codecov[bot]

There's a lot of code duplicated from repository.go:679 (valueFiles logic) to the new copyFrom logic. I think these should be refactored to call common functions. Also, they way that both parts are checking out repos and deferring to closer functions for cleanup individually is something I don't like the idea of. Probably repos should be checked out once, cached, and when it's time for the manifest generation released and cleaned up.

I think once the copyFrom logic is there and I both have a clearer view of how stuff works and I'm more comfortable with the logics here, I will see how can I clean this mess up.

And I'm also planning to address the LintGo failing test later.

gczuczy avatar Feb 17 '23 15:02 gczuczy

I've got the following errors while running test-local:

# github.com/argoproj/argo-cd/v2/controller
controller/state.go:191:3: github.com/sirupsen/logrus.Debugf format %s has arg source of wrong type github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1.ApplicationSource
(...)
# github.com/argoproj/argo-cd/v2/util/argo
util/argo/argo.go:417:14: fmt.Sprintf format %s has arg source of wrong type github.com/argoproj/argo-cd/v2/pkg/apis/application/v1alpha1.ApplicationSource

And the String operator for ApplicationSources is defined at pkg/apis/application/v1alpha1/generated.pb.go:16086:

func (this *ApplicationSource) String() string {
  if this == nil {
    return "nil"
  }

IIRC I haven't fiddled with this part, and I'm not really sure what might have caused those test errors.

gczuczy avatar Feb 21 '23 09:02 gczuczy

I think the PR looks feature-complete right now, unit tests are also passing. The place I couldn't find the tests for are the changes in the reposerver/repository/repository.go:runManifestGenAsync function, that's where the logic responsible for this change is called. Everything else should be covered by the unit tests.

Docs are still to be adjusted.

gczuczy avatar Feb 24 '23 13:02 gczuczy

Docs are updated as well.

@ishitasequeira, could you please review it?

gczuczy avatar Feb 24 '23 16:02 gczuczy

Docs are updated as well.

@ishitasequeira, could you please review it?

Will review it today.

ishitasequeira avatar Feb 24 '23 16:02 ishitasequeira

This is looking really cool!

We'll probably want some e2e tests to validate a few base use cases for this feature.

crenshaw-dev avatar Feb 24 '23 21:02 crenshaw-dev

One thought that comes to my mind is, are we allowing a nested access of referenced sources for from? For example, something like below:

sources:
  - repoURL: http://my/repo.git
     targetRevision: master
     ref: values1
  - repoURL: http://my/repo-one.git
     targetRevision: 42.0.1
     ref: values2
     from:
       - ref: $values1
	 sourcePath: "dev/values.yaml"
          destinationPath: "env-values.yaml"
  - repoURL: http://my/repo-two.git
     targetRevision: 42.0.1
     from:
        - ref: $values2
	   sourcePath: "env-values.yaml"
           destinationPath: "dev/env-values.yaml"

I think this can lead to a lot of concerns. We should probably not allow ref and from in the same source. WDYT?

ishitasequeira avatar Feb 24 '23 23:02 ishitasequeira

One thought that comes to my mind is, are we allowing a nested access of referenced sources for from? For example, something like below:

sources:
  - repoURL: http://my/repo.git
     targetRevision: master
     ref: values1
  - repoURL: http://my/repo-one.git
     targetRevision: 42.0.1
     ref: values2
     from:
       - ref: $values1
	 sourcePath: "dev/values.yaml"
          destinationPath: "env-values.yaml"
  - repoURL: http://my/repo-two.git
     targetRevision: 42.0.1
     from:
        - ref: $values2
	   sourcePath: "env-values.yaml"
           destinationPath: "dev/env-values.yaml"

I think this can lead to a lot of concerns. We should probably not allow ref and from in the same source. WDYT?

Currently that does make sense, however I'm considering another feature, where the output of a source can be referenced in another source building on this feature. Adding something like a saveOutput to the ApplicationSource, then being able to reference that both in the helm and from sections with something like $ref/$output. This with render the current mechanics into stage-like steps and interaction between them would be allowed. Moreover the builtin kustomize and helm processing facilities could coexists with standalone plugins. And right there disallowing having copy and ref at the same time would effectively block that feature.

I think, as of right now, putting a warning into the documentation is more effective.

Also, I think adding O_EXCL helps here with overwriting files:

O_EXCL Ensure  that this call creates the file: if this flag is specified in conjunction with O_CREAT, and pathname already exists, then open() fails with the error EEXIST```
This would solve accidental overwrites. Just tested this, and open indeed fails if the file already exists.

The question is, do we want to explicitly support transitive copies? I don't think that's something we should be factoring in.

gczuczy avatar Feb 27 '23 07:02 gczuczy

I don't understand why the tests are broken, before i've synced the branch to master, everything was green.

gczuczy avatar Mar 03 '23 11:03 gczuczy

I don't really understand why the e2e test TestNamespacedAppLogs is failing. I shouldn't have touched anything related to that test's codepath. Also, when I'm reproducing it locally it fails differently:

time="2023-03-08T12:51:45+01:00" level=error msg="`../../dist/argocd app create argocd-e2e-external/test-namespaced-app-logs --repo http://argocd-e2e-server:9081/argo-e2e/testdata.git --dest-server https://kubernetes.default.svc --path guestbook-logs --project default --dest-namespace argocd-e2e--test-namespaced-app-logs-hthfa --server 127.0.0.1:4443 --auth-token *** --insecure` failed exit status 20: time=\"2023-03-08T12:51:45+01:00\" level=fatal msg=\"rpc error: code = Unknown desc = namespace 'argocd-e2e-external' is not permitted\"" execID=ca1ee
    actions.go:195: failed expectation: error

And the most interesting part is, I've reverted to a commit prior to my changes, and it still fails the same way as shown above:

$ git status
HEAD detached at 9fb7847f
$ git log 9fb7847f | head -7
commit 9fb7847f4e054b05b222a6cfabe4cd821d90f6e5
Author: Noah Krause <--->
Date:   Tue Mar 7 16:03:50 2023 -0500

    fix: typo in doc link (#12744)

    Signed-off-by: Noah Krause <--->

gczuczy avatar Mar 08 '23 11:03 gczuczy

@gczuczy that test is flaky. I have an open PR to try to fix that.

crenshaw-dev avatar Mar 09 '23 20:03 crenshaw-dev

I'm eagerly waiting for the merge of this PR. Keep up the good work!

mircyb avatar Mar 14 '23 11:03 mircyb

Here's a simplistic test example:

kubectl apply -f https://raw.githubusercontent.com/gczuczy/argocd-copyfrom-test/main/ArgoApp.yaml

This application has a kustomization in one repo, and one of the kustomization.yaml's inputs are in another repo, by environment. The involved repos are: https://github.com/gczuczy/argocd-copyfrom-test https://github.com/gczuczy/argocd-copyfrom-test-envs

There was another internal test, which is rather complicated, it deploys an ArgoCD ecosystem with a superchart, ArgoCD and ArgoWorkflows as a subchart, with a fair amount of secrets, for which we're using AVP to acquire. The gist of that secret are the following two snippets:

For the Application spec, here's the important part:

sources:
  - repoURL: 'https://valu.es/repo/sitory.git'
    targetRevision: master
    ref: valuesrepo
  - repoURL: 'https:/artifact.repo/testing'
    targetRevision: 0.42.24
    plugin:
      env:
        - name: VALUES
          value: env-values.yaml
    chart: our-argo-superchart
    from:
      - sourcePath: $valuesrepo/dev/values.yaml
        destinationPath: env-values.yaml

Here the chart has nil values for all the secrets, and the values repo's values.yaml has <placeholder>s all over the place. Even for the AVP credentials.

The environment-specific values.yaml is passed to the AVP plugin through the $ARGOCD_ENV_VALUES envvars, and the plugin definition has the following spec.generate.command:

      generate:
        command:
          - sh
          - "-c"
          - "/usr/local/bin/helm template $ARGOCD_APP_NAME -n $ARGOCD_APP_NAMESPACE --values $ARGOCD_ENV_VALUES . | /usr/local/bin/argocd-vault-plugin generate - -s namespace:avp-config-cm"

So, helm is using the environment-specific values from a separate repo and its output is fed to AVP, which is taking care of the <placeholder> replacements, and nicely adding the environment-specific secrets all over the place. This pattern deploys argocd itself in our internal testing environment.

gczuczy avatar Mar 16 '23 17:03 gczuczy

@gczuczy looks great!

lippertmarkus avatar Apr 18 '23 12:04 lippertmarkus

Thank you @ishitasequeira for the review and taking care of this. Merging this will be very much appreciated.

Please allow me to hint at a little work flow for future iterations of this kind:

You could use GitHub Markdown's suggestion feature ¹ ² ³ to line comments such as yours in https://github.com/argoproj/argo-cd/pull/12508#discussion_r1177852293

The proposed change will be directly committable to the branch. It appears this is a very convenient way to communicate code change proposals between reviewers and contributors, which I can only recommend to adopt for convenience of both sides in a review.

almereyda avatar Apr 26 '23 13:04 almereyda

@ishitasequeira Thank you, updated the branch, and fixed that nit as well. Sorry that it got slipped.

gczuczy avatar Apr 26 '23 13:04 gczuczy

@crenshaw-dev I don't know what to do about the kubeversion-specific e2e tests, those sometimes are failing, and at the next sync to master they are succeeding again. Could you please help out with that one?

gczuczy avatar Apr 26 '23 16:04 gczuczy

cc @gczuczy also cc @jkroepke for helm-secrets implementation

My team and I have taken the chance to test out this feature in our cluster, as we've been wanting to use helm-secrets with Multi-Source apps, and we've had great results! The repository pulling seems to work fine, and we had no trouble getting it working with the helm-secrets schema.

Our application manifest looked like this:


apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: data-dog
  namespace: argo-cd
spec:
  project: infrastructure
  sources:
    - repoURL: https://github.com/org/repo.git
      targetRevision: HEAD
      ref: values
      
    - repoURL: https://helm.datadoghq.com
      targetRevision: 3.25.1
      chart: datadog
      helm:
        valueFiles:
          - app-values/values.yaml
          - secrets://app-values/secrets.yaml
          
          - app-values/env/values.yaml
      from:
        - sourcePath: "$values/kube/apps/data-dog"
          destinationPath: "app-values"
  destination:
    server: "https://kubernetes.default.svc"
    namespace: data-dog
  syncPolicy:
    automated:
      selfHeal: true
      prune: true
    syncOptions:
      - CreateNamespace=true

One thing we're keen to do is rebase this PR against the release-2.7 branch, and see how that affects things, so stay tuned for that.

Perhaps unrelated, but after the initial testing we found ourselves blocked by a curious timeout issue when deploying Helm apps. The model we're using is standard app-of-apps model, with one application manifest that is deployed using kubectl apply -f and references an argo/ dir, containing a chart and the templates (app.yamls). The directory structure is similar to this:

repo
│
└───argo
│   │   Chart.yaml
│   │   values.yaml
│   │
│   └───templates
│       │   app1.yaml
│       │   app2.yaml
│       │   ...
│   
└───base
│   │   base-app.yaml
│   │   ...

We soon found that the deployment of our base-app was not successful, failing due to a timeout of the helm template run at Sync-time.


ComparisonError: rpc error: code = 
Unknown desc = 
Manifest generation error (cached): 

helm template . --name-template infrastructure-apps --namespace argo-cd --kube-version 1.23 --values /tmp/afaead53-5c5a-46c6-80d8-5f8be43dcda4 --api-versions admissionregistration.k8s.io/v1 --api-versions admissionregistration.k8s.io/v1/MutatingWebhookConfiguration --api-versions admissionregistration.k8s.io/v1/ValidatingWebhookConfiguration --api-versions apiextensions.k8s.io/v1 --api-versions apiextensions.k8s.io/v1/CustomResourceDefinition --api-versions apiregistration.k8s.io/v1 --api-versions apiregistration.k8s.io/v1/APIService --api-versions apps/v1 --api-versions apps/v1/ControllerRevision --api-versions apps/v1/DaemonSet --api-versions apps/v1/Deployment --api-versions apps/v1/ReplicaSet --api-versions apps/v1/StatefulSet --api-versions argoproj.io/v1alpha1 --api-versions argoproj.io/v1alpha1/AppProject --api-versions argoproj.io/v1alpha1/Application --api-versions argoproj.io/v1alpha1/ApplicationSet --api-versions autoscaling/v1 --api-versions autoscaling/v1/HorizontalPodAutoscaler --api-versions autoscaling/v2 --api-versions autoscaling/v2/HorizontalPodAutoscaler --api-versions autoscaling/v2beta1 --api-versions autoscaling/v2beta1/HorizontalPodAutoscaler --api-versions autoscaling/v2beta2 --api-versions autoscaling/v2beta2/HorizontalPodAutoscaler --api-versions batch/v1 --api-versions batch/v1/CronJob --api-versions batch/v1/Job --api-versions batch/v1beta1 --api-versions batch/v1beta1/CronJob --api-versions certificates.k8s.io/v1 --api-versions certificates.k8s.io/v1/CertificateSigningRequest --api-versions coordination.k8s.io/v1 --api-versions coordination.k8s.io/v1/Lease --api-versions crd.k8s.amazonaws.com/v1alpha1 --api-versions crd.k8s.amazonaws.com/v1alpha1/ENIConfig --api-versions discovery.k8s.io/v1 --api-versions discovery.k8s.io/v1/EndpointSlice --api-versions discovery.k8s.io/v1beta1 --api-versions discovery.k8s.io/v1beta1/EndpointSlice --api-versions events.k8s.io/v1 --api-versions events.k8s.io/v1/Event --api-versions events.k8s.io/v1beta1 --api-versions events.k8s.io/v1beta1/Event --api-versions flowcontrol.apiserver.k8s.io/v1beta1 --api-versions flowcontrol.apiserver.k8s.io/v1beta1/FlowSchema --api-versions flowcontrol.apiserver.k8s.io/v1beta1/PriorityLevelConfiguration --api-versions flowcontrol.apiserver.k8s.io/v1beta2 --api-versions flowcontrol.apiserver.k8s.io/v1beta2/FlowSchema --api-versions flowcontrol.apiserver.k8s.io/v1beta2/PriorityLevelConfiguration --api-versions networking.k8s.io/v1 --api-versions networking.k8s.io/v1/Ingress --api-versions networking.k8s.io/v1/IngressClass --api-versions networking.k8s.io/v1/NetworkPolicy --api-versions node.k8s.io/v1 --api-versions node.k8s.io/v1/RuntimeClass --api-versions node.k8s.io/v1beta1 --api-versions node.k8s.io/v1beta1/RuntimeClass --api-versions policy/v1 --api-versions policy/v1/PodDisruptionBudget --api-versions policy/v1beta1 --api-versions policy/v1beta1/PodDisruptionBudget --api-versions policy/v1beta1/PodSecurityPolicy --api-versions rbac.authorization.k8s.io/v1 --api-versions rbac.authorization.k8s.io/v1/ClusterRole --api-versions rbac.authorization.k8s.io/v1/ClusterRoleBinding --api-versions rbac.authorization.k8s.io/v1/Role --api-versions rbac.authorization.k8s.io/v1/RoleBinding --api-versions scheduling.k8s.io/v1 --api-versions scheduling.k8s.io/v1/PriorityClass --api-versions storage.k8s.io/v1 --api-versions storage.k8s.io/v1/CSIDriver --api-versions storage.k8s.io/v1/CSINode --api-versions storage.k8s.io/v1/StorageClass --api-versions storage.k8s.io/v1/VolumeAttachment --api-versions storage.k8s.io/v1beta1 --api-versions storage.k8s.io/v1beta1/CSIStorageCapacity --api-versions v1 --api-versions v1/ConfigMap --api-versions v1/Endpoints --api-versions v1/Event --api-versions v1/LimitRange --api-versions v1/Namespace --api-versions v1/Node --api-versions v1/PersistentVolume --api-versions v1/PersistentVolumeClaim --api-versions v1/Pod --api-versions v1/PodTemplate --api-versions v1/ReplicationController --api-versions v1/ResourceQuota --api-versions v1/Secret --api-versions v1/Service --api-versions v1/ServiceAccount --api-versions vpcresources.k8s.aws/v1beta1 --api-versions vpcresources.k8s.aws/v1beta1/SecurityGroupPolicy --include-crds

failed timeout after 1m30s.

After following some Issues in this repository, we bumped the resource requests, execution time of our repo-server, added replicas to our controller, and flushed our redis pods. No luck.

I'm not sure whether this issue is related to this feature or not. The branch i've pulled from this PR seems to be built from 2.6.0.xxx, so perhaps it's an old bug from that release, compounded with this feature.

I've taken the liberty of adding some extra info logging statements around the Helm templating functions in reposerver/repository/repository.go. Seeing as I've not come across an issue around template in any other Issues, I thought it best to investigate. Will elaborate findings soon.

harrywm avatar Apr 28 '23 15:04 harrywm

Thanks @crenshaw-dev , any timeline when these changes will be released?

mnikser avatar May 01 '23 18:05 mnikser

@harrywm That seems rather strange. In that helm command the deployment namespace (argo-cd) does not match the application's (data-dog) in the appdef. And I may be missing something but the app's name in --name-template infrastructure-apps also does not match anything in the appdef resource.

Another thing is, that command has a single values defined for /tmp/afaead53-5c5a-46c6-80d8-5f8be43dcda4, and nothing that seems to correlate with the Application resource's fields. I'm not sure whether there's any correlation here with this patch.

And regarding the 2.6.x version, I frequently update the branch. You can just clone my forked repo and pull the branch directly, and build the image from there. I'mdoing my best to keep it updated with upstream master - including updating it in our internal PoC environment and seeing how apps behave there.

gczuczy avatar May 01 '23 18:05 gczuczy

@mnikser no timeline yet. I wasn't able to secure review time for the 2.8 cycle (other priorities). I'm hoping another Approver can take it on for 2.8.

crenshaw-dev avatar May 01 '23 19:05 crenshaw-dev

I think there need to be some way of indicating what git revision targetRevision was resolved to, so plugins, or someone inspecting the Application CRD can discover where the values came from. Otherwise will be hard to debug if updates from git have been applied by ArgoCD?

jimmyjones2 avatar May 27 '23 09:05 jimmyjones2

@jimmyjones2 I believe that's already stored in the Application status field, as a feature of multi-source apps.

crenshaw-dev avatar May 27 '23 14:05 crenshaw-dev

@jimmyjones2 I believe that's already stored in the Application status field, as a feature of multi-source apps.

@crenshaw-dev Have raised #13859 - it doesn't seem to be

jimmyjones2 avatar May 31 '23 20:05 jimmyjones2

Found out that when updating the git referenced source, the copied file is not actually moved to the helm chart, because it already exists from a previous sync - the chart is the same version, it's not fetched and extracted again, and its kept simply there on the filesystem. This resulted in a weird not-always-syncing-on-updating-the-values in its repo. With switching strategies from avoiding overwrites to always overwriting (and putting a warning about it) it's better.

@crenshaw-dev @ishitasequeira If you happen to know how to do this better, please let me know. I'm not exactly sure whether there is a logic here looking for connections between the sources: when the copy source is updated, the copy's destination's cache has to be invalidated. I've tried to affect this in appcontroller.go, however I've realized I was doing something else at those places - so this is reverted. I think this might need a look into still.

gczuczy avatar Jun 07 '23 09:06 gczuczy

any progress?

vl-kp avatar Jun 27 '23 14:06 vl-kp

@vl-kp Gergely is waiting on me for a review. Unfortunately I won't be able to get this into 2.8. I still think it's a good PR and still plan to dedicate time to getting it merged, I'm just out of time this cycle.

crenshaw-dev avatar Jun 27 '23 14:06 crenshaw-dev

Please keep working on it. We are eagerly waiting for this solution.

guido-1 avatar Jul 01 '23 20:07 guido-1

just wondering, currently if we use multisource, the UI shows the commit/version for the helm chart instead of the ref one, will there anything change if this PR merge?

vl-kp avatar Jul 11 '23 00:07 vl-kp