argo-helm icon indicating copy to clipboard operation
argo-helm copied to clipboard

Argo CD: PersistentVolumeClaims for volumes

Open docent-net opened this issue 3 years ago • 31 comments

Is your feature request related to a problem? Please describe.

I'm trying to mount remote storage to the argocd repo-server. I can provide volumeMounts list as well as volumes list, but I can't create PVC for those volumes using helm chart.

Describe the solution you'd like

volumes.yaml could contain a definition of StorageClass name for PVC. When provided, the volume mount would be created on top of this PVC.

Describe alternatives you've considered

Alternatively, documentation could be extended with information about using PVC and manual creation of PVC.

Additional context

I can see, that similar feature is already implemented in redis-ha: https://github.com/argoproj/argo-helm/blob/d7da8e863f30a6975b89a1bba34855d83bed59e3/charts/argo-cd/charts/redis-ha/templates/redis-ha-statefulset.yaml#L285

Also, this request is only about repo-server, but similar could be requested for Redis and other components.

If this looks like something useful, I can provide PR.

docent-net avatar Sep 03 '20 21:09 docent-net

I think this is more than useful and should definitely comes outside of the box. @docent-net any chance you can get a stab at this fairly soon ?

tlvenn avatar Oct 17 '20 00:10 tlvenn

Sure, on it :)

docent-net avatar Oct 18 '20 10:10 docent-net

@tlvenn I need to ask a question, about this. Read through ArgoCD docs (e.g. here https://argoproj.github.io/argo-cd/operator-manual/high_availability/#argocd-repo-server) and it looked like a straightforward thing - just create volumeClaimTemplates with proper configuration for the PVC and that's all.

However, repo-server can be replicated, so here it hit me, that maybe we'd need a StatefulSet here instead of Deployment, because there is some state.. or not? Unfortunately, I'm not that familiar w/ArgoCD HA, but after going through the documentation and thinking about it I have 2 assumptions to discuss:

  1. ArgoCD is rather a stateless app (but for the Redis, which is a cache, so its data can be lost). We don't really care about contents of the repo-server, and the only "important" directory is /tmp, where repos are cloned. "important" because the only reason we care about mounting it from the remote repo is that its size can grow and Pod disk can run out of space. In this case, when Pod is lost for some reason (node down, Pod was evicted by scheduler) underlying storage can be safely removed by k8s, as it will not be used anymore and its contents are not important. In this case, we can stay with Deployment resource as it's currently defined and basically create new PVs via volumeClaimTemplates.
  2. We care about the state and contents of /tmp and we need to replace Deployment w/Statefulset to be sure, that replicas use the same storage as before.

After writing this whole post I think that the correct answer is nr.1, but just need to validate it w/you :) Thanks

docent-net avatar Nov 20 '20 18:11 docent-net

Hi @docent-net , I am no expert when it comes to argo cd but my feeling is that the repo server is a cache layer and I would opt for option 1 as well.

tlvenn avatar Nov 23 '20 21:11 tlvenn

For illustration purpose, not saying this a good idea, Grafana chart for example supports both approaches:

https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/deployment.yaml#L1 https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/statefulset.yaml#L1

tlvenn avatar Nov 24 '20 05:11 tlvenn

I'm going w/opt1 - lets face it - the data is stored in /tmp, so let's treat it as temporary :) Thanks, will get into it

docent-net avatar Nov 24 '20 07:11 docent-net

Stale issue message

github-actions[bot] avatar Apr 01 '21 03:04 github-actions[bot]

@docent-net did you make any progress on this ?

tlvenn avatar Apr 09 '21 17:04 tlvenn

I'm a little confused...I am trying to deploy ArgoCD in HA mode and use a PVC for the repo-server to avoid the disk consumption issue, described above. Of course, there is an issue with one PVC. Option 1 above mentions leave as a Deployment and use volumeClaimTemplates....but those aren't supported for Deployments, only SS. Am I missing something? Do I just convert to SS?

jbartlet avatar Apr 22 '21 15:04 jbartlet

Can this issue be re-opened? Right now the only possible option without making changes to the helm chart is to use emptyDir

ramanNarasimhan77 avatar Jun 07 '21 17:06 ramanNarasimhan77

Please reopen this issue:

I'm using Longhorn replicated blockstorage, indeed it is my default storageclass and I would love to make use of it for argocd. This should be the default behavior at least for non-HA, @tlvenn gave a good example for a working non-HA/HA Chart:

For illustration purpose, not saying this a good idea, Grafana chart for example supports both approaches:

https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/deployment.yaml#L1 https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/statefulset.yaml#L1

Furthermore as @docent-net already pointed out, this Chart is partially making use of the ability to set a custom storageclass:

Additional context

I can see, that similar feature is already implemented in redis-ha:

https://github.com/argoproj/argo-helm/blob/d7da8e863f30a6975b89a1bba34855d83bed59e3/charts/argo-cd/charts/redis-ha/templates/redis-ha-statefulset.yaml#L285

kaidobit avatar Dec 16 '21 22:12 kaidobit

https://kubernetes.io/docs/concepts/storage/_print/#generic-ephemeral-volumes is alpha in 1.19, beta in 1.21 and stable in 1.23.

snuggie12 avatar May 07 '22 04:05 snuggie12

Please reopen this issue.

pierluigilenoci avatar May 25 '22 13:05 pierluigilenoci

@mkilchhofer could you please take a look?

pierluigilenoci avatar May 25 '22 13:05 pierluigilenoci

@pierluigilenoci I don't understand why someone wants to configure PV on Argo CD components.

We only implement use cases which are supported by the upstream project itself (https://github.com/argoproj/argo-cd).

One thing I am okay with is to add the ability to deploy any kind of resources (extra manifests) like the bitnami charts do.

mkilchhofer avatar May 25 '22 17:05 mkilchhofer

@mkilchhofer quoted from the official docs:

argocd-repo-server clones repository into /tmp ( of path specified in TMPDIR env variable ). Pod might run out of disk space if have too many repository or repositories has a lot of files. To avoid this problem mount persistent volume.

The problem is they officially tell you to do something without telling you how to.

FWIW, I'm using my suggested approach of ephemeral volumes while keeping it a Deployment but I know that's not available to everyone since it's in 1.21+ as beta:

volume for tmp for repo-server
spec:
  template:
    spec:
      volumes:
      - ephemeral:
          volumeClaimTemplate:
            metadata:
              creationTimestamp: null
              labels:
                type: argocd-repo-server-tmp
            spec:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
              storageClassName: standard-ssd
              volumeMode: Filesystem

snuggie12 avatar May 25 '22 17:05 snuggie12

ACK. That makes sense. There's already an issue upstream for that: https://github.com/argoproj/argo-cd/issues/7927

mkilchhofer avatar May 26 '22 12:05 mkilchhofer

@mkilchhofer I don't have a particular fetish for PVs.

I just wish that in the presence of large caches, a situation aggravated also by some bug (https://github.com/argoproj/argo-cd/issues/4002 and https://github.com/helm/helm/issues/10583), the Repo Server pods are not evicted ('Pod The node had condition: [DiskPressure]. ') or, even worse, other pods are evicted in the same node (exactly what happened in my case).

In any case the idea of allowing the deployment of extra manifests is always a good idea (and if you look at my history on GitHub it is something I particularly appreciate).

Thanks for your help and your time.

pierluigilenoci avatar May 27 '22 08:05 pierluigilenoci

It might make sense to link the CNCF's Slack (stranded) discussion as well: https://cloud-native.slack.com/archives/C020XM04CUW/p1653511733823829

pierluigilenoci avatar Jun 10 '22 08:06 pierluigilenoci

@docent-net any update on this?

pierluigilenoci avatar Jun 20 '22 07:06 pierluigilenoci

@pierluigilenoci sure. I've been following this discussion (as well as that mentioned on Slack and regarding StatefulSets) and I'm not sure right now.

This issue is about Deployment. My original problem was that having replica=1 for Deployment type, I wanted to have persistent storage for the git cache. Thanks to that, in case of POD replacement, my infra wouldn't be impacted with traffic/performance spike due to cache rebuilding / git repos pulling on a higher scale.

And I have implemented that, and it works on my cluster correctly. But I have only one replica.

Now - with replicas>1 there's problem of AccessModes, that might be very specific. Having it set to ReadWriteMany (and that is a requirement for this workload), could mean a huge performance drop due to underlying file system configurations.

Why do I think so? Thinking about possible scenarios, where this PVC would be used I can only think of considerable caches with many, many (possible little) files (that is how git repositories looks like, thinking .git directories, indexes etc). And FS synchronization across nodes with such huge distributed disk might be really problematic. Now think of possible issues, that could occur as a side effect of it. The user has inconsistent git directories because of some consistency problem of underlying DFS.

Having this said, I'm not sure anymore, that this PR is a good idea. StatefulSet might be a better approach here.

I can still raise a PR, but I'm just afraid, that someone someday will have a big headache due to this.

docent-net avatar Jun 20 '22 13:06 docent-net

@docent-net sorry for the delay in answering.

The answer you are looking for is Statefulset. If you have multiple repo-server replicas you must also have multiple disks, one for each replica. So the solution for assigning 1:1 disks is Statefulset.

Using NFS is a problem of overcomplication and poor performance. From personal experience I can say that sometimes a cache on NFS is slower than re-loading everything from scratch over the network.

The alternative solution is to use emptyDir but it does not solve the node disk load problem. Unless you put the disk in memory but in this case the cost increases.

The perfect solution does not exist but something needs to be done.

pierluigilenoci avatar Jul 05 '22 11:07 pierluigilenoci

I've one solution to propose

  1. default deploy behaviour is the current one with emptyDir

  2. if a different volume definition is provided in values file then it's applied as-is, without any additional implementation, leaving final user the desired approach for volume management .

repoServer:
 existingVolumes:
    tmp_dir:
      persistentVolumeClaim:
        claimName: pvc-argocd-repo-server-tmp

I've performed such changes on my forked repo for details: https://github.com/argoproj/argo-helm/compare/main...aroundthecode:argo-helm:reposerver-pvc

With such apporach user can create PV/PVC outside helm and then attach to the deploy, or leave everything as is. It could be a quick fix allowing user to avoid diskpressure issue and going on discussing Statefulset migration

aroundthecode avatar Aug 22 '22 15:08 aroundthecode

Hello there! Do you intend to open a PR for this solution so we can have it in the oficial Helm Chart? I am trying to migrate an internal chart that we use to the official one and this is one of the features that we would need in order to achieve that. For now I will create a fork but I think that this config should be in the oficial Helm Chart as it is recommended by the High Availability doc.

arielly-parussulo avatar Nov 08 '22 17:11 arielly-parussulo

@aroundthecode

I've performed such changes on my forked repo for details: https://github.com/argoproj/argo-helm/compare/main...aroundthecode:argo-helm:reposerver-pvc

So, are you able to raise a PR for this, since you already have it worked out?

docent-net avatar Nov 08 '22 18:11 docent-net

I ended up creating a PR to add support to deploy argocd-repo-server as a StatefulSet with PVCs based on what we used in my company: https://github.com/argoproj/argo-helm/pull/1648 Feel free to add suggestions or editing it.

arielly-parussulo avatar Nov 16 '22 15:11 arielly-parussulo

@arielly-parussulo, your PR is a bit stranded...

pierluigilenoci avatar Apr 17 '23 11:04 pierluigilenoci

@aroundthecode

I've performed such changes on my forked repo for details: main...aroundthecode:argo-helm:reposerver-pvc

So, are you able to raise a PR for this, since you already have it worked out?

HI all, I've tried to perform an alternative PR #2410 with my approach on this, not sure if everthing is fine, please check and advise if needed

aroundthecode avatar Jan 02 '24 11:01 aroundthecode

@aroundthecode oddly enough I was probably about a week away from making a PR since we're about to start using the helm chart. Yours looks more comprehensive though. Thanks!

snuggie12 avatar Jan 02 '24 17:01 snuggie12

@snuggie12 ( or anyone) can you please help me with the PR? it seems something is missing in the docs generation but I can't realize what.

I've added the helm-docs comments to my values.yaml section but is seems something is still missing since it's telling me the documentation is outdated.

aroundthecode avatar Jan 02 '24 19:01 aroundthecode