kustomize icon indicating copy to clipboard operation
kustomize copied to clipboard

Concurrency issues with Helm generator - archive already exists

Open gaeljw opened this issue 2 years ago • 24 comments
trafficstars

What happened?

Given the following file structure where the base has a helmChart reference:

base/kustomization.yaml <-- has a helmChart reference
overlays/overlay1/kustomization.yaml
overlays/overlay2/kustomization.yaml

If you run Kustomize on both overlays at the same time, there's a chance one of the 2 commands fail with following error:

Error: failed to untar:
a file or directory with the name /tmp/[email protected]_xxxxxx_devops_apps/base/apps/data-query-gateway/charts/k8s-app-1.2.7.tgz already exists:
unable to run: 'helm pull --untar --untardir /tmp/[email protected]_xxxxxx_devops_apps/base/apps/data-query-gateway/charts --repo https://argo:[email protected]/api/v4/projects/29393074/packages/helm/stable k8s-app --version 1.2.7'
with env=[HELM_CONFIG_HOME=/tmp/kustomize-helm-837700893/helm HELM_CACHE_HOME=/tmp/kustomize-helm-837700893/helm/.cache HELM_DATA_HOME=/tmp/kustomize-helm-837700893/helm/.data]
(is 'helm' installed?)

See several users reporting the issue in the context of ArgoCD where it's a common setup to have multiple "applications" concurrently synchronizing and if they happen to share same Kustomize base, the error above happens.

What did you expect to happen?

Ideally, it should be possible to run Kustomize on both overlays without any concurrency issues.

How can we reproduce it (as minimally and precisely as possible)?

N/A

Expected output

N/A

Actual output

N/A

Kustomize version

5.0.3

Operating system

Linux

gaeljw avatar Aug 17 '23 18:08 gaeljw

I haven't looked at the code yet. Is there any reason to raise this error in the 1st place?

I see several options:

  • do not check for file already present, if file already present, assume it's the good one and continue
  • implement some kind of "lock"/retries mechanism: if a run is ongoing, wait few milliseconds and retry?
  • download the charts in a repo with a unique name. I think this would conflict with existing features though.

Maybe the new behaviour could be enabled with a flag like --ignore-helm-concurrency-issue (just to give the idea, the name should be shorter of course!) if we're not confident it won't break existing usages.

gaeljw avatar Aug 17 '23 18:08 gaeljw

Actually, looking now at the code, Kustomize is "only" calling Helm with a command similar to helm pull --untar --untardir ... --repo ... ... and the error comes from Helm.

https://github.com/kubernetes-sigs/kustomize/blob/911ddcda40a8c3591a1eee3929f775c8c8c57330/plugin/builtin/helmchartinflationgenerator/HelmChartInflationGenerator.go#L285-L296

It would be nice if Helm supported some kind of "overwrite" mode then I guess. But this might still bring concurrency issues? Maybe it's still best to handle it at Kustomize level?

Related issues on Helm:

  • https://github.com/helm/helm/issues/7182
  • https://github.com/helm/helm/issues/12315

gaeljw avatar Aug 18 '23 06:08 gaeljw

I am also encountering this problem.

be-hase avatar Sep 26 '23 00:09 be-hase

hi @gaeljw if you don't mind ,can you please share the kustomization file along with files you are using under the overlays directory so that we can also debug it once

DiptoChakrabarty avatar Oct 16 '23 14:10 DiptoChakrabarty

@DiptoChakrabarty I've created an example here:

https://github.com/greenkiwi/kustomize-helm-concurrency-issue

It can be run locally or it can be run via the Github Action there. It has our script that runs kustomize in parallel for the builds. Note, I've added a for loop, just to help ensure that the problem can be seen.

It also appears that if the chart is already there with the correct version, it doesn't have an issue.

greenkiwi avatar Oct 16 '23 20:10 greenkiwi

@DiptoChakrabarty I can provide another minimal reproduction case as well if needed but it's quite similar to the one given by @greenkiwi (thanks!)

gaeljw avatar Oct 17 '23 07:10 gaeljw

Could a potential fix be to use os.MkdirTemp as the untar target and then copy the resulting filesystem into the target path, so helm won't fight itself?

shakefu avatar Nov 02 '23 19:11 shakefu

/assign @annasong20

for further investigation

natasha41575 avatar Nov 22 '23 17:11 natasha41575

Is there any progress on this issue? Do you think there's a chance it can be fixed?

gaeljw avatar Nov 28 '23 18:11 gaeljw

/unassign Sorry @natasha41575, @gaeljw, I don't think I'll have time to investigate this issue.

annasong20 avatar Dec 04 '23 16:12 annasong20

Hi @gaeljw, sorry for the late response. I can reproduce your issue. Yes, as your comments point out, the problem is due to parallel calls to https://github.com/kubernetes-sigs/kustomize/blob/master/plugin/builtin/helmchartinflationgenerator/HelmChartInflationGenerator.go#L256-L261. Multiple threads see that the .tgz file doesn't exist and attempt to download it. After the 1st thread, successive downloads fail because the .tgz was downloaded by the 1st.

Our stance is that this is a helm issue, because Helm clearly doesn't support parallel runs of helm pull, as aforementioned. Pasting the terminal output here for clarity

helm pull --untar --untardir /tmp/charts/minecraft-3.1.3 --repo https://itzg.github.io/minecraft-server-charts minecraft --version 3.1.3

Error: failed to untar: a file or directory with the name /tmp/charts/minecraft-3.1.3/minecraft-3.1.3.tgz already exists

See Kustomize's stance on Helm support here.

However, if this issue is popular enough, I personally would be willing to accept a patch that either uses "locks" or unique download directory names (which will hurt performance). Would need to discuss with other Kustomize maintainers, especially @koba1t.

/area helm /triage under-consideration

annasong20 avatar Dec 20 '23 18:12 annasong20

@annasong20 Should we also be raising this issue with helm?

greenkiwi avatar Dec 20 '23 22:12 greenkiwi

Thanks for the feedback @annasong20

For now Helm considers this is not an issue with Helm but its usage in Kustomize. See the related issue: https://github.com/helm/helm/issues/12315

If anyone have strong arguments to present to Helm, please do :)

I'm more in the idea of fixing it in Kustomize but I don't have enough knowledge in Go to proceed with a PR.

gaeljw avatar Dec 21 '23 08:12 gaeljw

@gaeljw Thanks for the context! I can empathize with Helm's argument.

Will discuss this issue with the team after the holidays, then we can update the triage status. /cc @greenkiwi @koba1t @natasha41575

annasong20 avatar Dec 21 '23 16:12 annasong20

We have discussed this among the maintainers and we agree that this is a valid issue and we should try to fix it if we can. We are not sure that a locking mechanism will be possible because it is two different processes running, but we might be able to do something with the directory structure (e.g. adding a hash). If we modify the directory structure, we need to be really careful that we don't break anything with users' existing helm charts, whether those are local charts or remote charts that have already been pulled and cached by kustomize.

We are also open to other solutions if anyone has a feasible alternative.

/triage accepted

natasha41575 avatar Jan 03 '24 18:01 natasha41575

I feel this opinion helps to resolve this problem. https://github.com/helm/helm/issues/12315#issuecomment-1688478222

Ignore this error from Helm.

This idea helps resolve this problem without hurting performance and breaking change. I believe this is a better solution, but it takes time to implement.

koba1t avatar Jan 03 '24 20:01 koba1t

I am waiting for this PR to be merged. https://github.com/helm/helm/pull/12725

koba1t avatar Jan 24 '24 17:01 koba1t

any updates on this issue? there's no activity on the above mentioned PR in the last 3 months.

imperfect-fourth avatar Apr 09 '24 09:04 imperfect-fourth

I don't think https://github.com/helm/helm/pull/12725 get merged any time soon. It would be faster if we fix this in kustomize.

Skaronator avatar Jun 21 '24 12:06 Skaronator

For the record, at my company, we changed our structure to workaround this issue.

Instead of having N Kustomize folders -> 1 Kustomize base -> 1 Helm chart, we integrated the generation of the N variants in the Helm chart itself using arrays/dict in the Helm values.

In our case, it makes sense and it's easier to maintain in the end, but it might not be for everyone for sure.

gaeljw avatar Jun 21 '24 14:06 gaeljw

Adding retry-mechanism to solve this in kustomize not helm, right? Is there any idea?

shoppingjaws avatar Sep 24 '24 07:09 shoppingjaws