gitops-operator icon indicating copy to clipboard operation
gitops-operator copied to clipboard

ApplicationSet Controller Can't Clone Repositories Due to fork Error

Open caseyscarborough opened this issue 2 years ago • 9 comments

Description

After some period of time, the ApplicationSet controller becomes unable to create new Applications based on a repository. This is what appears in the logs:

time="2022-03-22T19:53:56Z" level=info msg="git fetch origin master --tags --force" dir=/tmp/https___github.com_my-org_my-repo-ocp-config execID=bUpzx
time="2022-03-22T19:53:56Z" level=error msg="`git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable" execID=bUpzx
time="2022-03-22T19:53:56Z" level=info msg=Trace args="[git fetch origin master --tags --force]" dir=/tmp/https___github.com_my-org_my-repo-ocp-config operation_name="exec git" time_ms=2.7651749999999997
time="2022-03-22T19:53:56Z" level=error msg="error generating params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable" generator="&{0xc000a1abc0}"
time="2022-03-22T19:53:56Z" level=error msg="error generating application from params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable" generator="{<nil> <nil> 0xc000cdd380 <nil> <nil> <nil>}"
2022-03-22T19:53:56.156Z	ERROR	controller-runtime.manager.controller.applicationset	Reconciler error	{"reconciler group": "argoproj.io", "reconciler kind": "ApplicationSet", "name": "my-repo-uat-apps", "namespace": "openshift-gitops", "error": "Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable", "errorVerbose": "`git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable\nError during fetching repo\ngithub.com/argoproj-labs/applicationset/pkg/services.checkoutRepo\n\t/remote-source/app/pkg/services/repo_service.go:156\ngithub.com/argoproj-labs/applicationset/pkg/services.(*argoCDService).GetDirectories\n\t/remote-source/app/pkg/services/repo_service.go:81\ngithub.com/argoproj-labs/applicationset/pkg/generators.(*GitGenerator).generateParamsForGitDirectories\n\t/remote-source/app/pkg/generators/git.go:74\ngithu...
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214

If you delete the openshift-gitops-applicationset-controller pod, everything begins to work properly again.

It seems to be this error specifically:

Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: 
error: cannot fork() for git-remote-https: Resource temporarily unavailable"

If you open a shell into the applicationset controller pod, no commands will run and each one returns an error similar to the following:

Screen Shot 2022-03-22 at 3 59 22 PM

Steps to Reproduce

Unfortunately I don't have exactly steps to reproduce as it appears to be non-deterministic, I don't know about it until Applications stop appearing when new folders are creating in configuration repositories (we are using the git generator in our ApplicationSets).

Additional context

We are running the latest version (1.4.3) of the Openshift Gitops Operator using Openshift Container Platform version 4.7.

Our workaround is deleting the pod, and then it works again temporarily.

caseyscarborough avatar Mar 22 '22 20:03 caseyscarborough

This will be fixed with the update to ApplicationSet v0.4.z in GitOps release 1.5.

The current shipped version (0.2) does not have an init-style process reaper in the container, so zombie processes may stack up under certain circumstances.

jannfis avatar Mar 29 '22 07:03 jannfis

Great news, thanks for the info and I'll keep an eye out for the 1.5 update.

caseyscarborough avatar Mar 29 '22 22:03 caseyscarborough

Hi @caseyscarborough , GitOps v1.5.0 is out. This issue should be fixed now. Can you please confirm if you are unblocked ?

iam-veeramalla avatar May 13 '22 09:05 iam-veeramalla

Hi @iam-veeramalla, looks like we're still on Openshift 4.7 so the upgrade isn't available to us. Do you know if there are plans to add it to 4.7 or will we need to upgrade to 4.8+ to update the GitOps operator to 1.5.0?

caseyscarborough avatar May 13 '22 15:05 caseyscarborough

Hi @iam-veeramalla we are facing the same problem with OpenShift 4.10.13, GitOps 1.5.1 and ApplicationSet controller v0.4.1. So unfortunately the update to the new GitOps Operator did not solve the bug. Here's an extract of the attached ApplicationSet controller logfile:

...
[openshift-gitops-applicationset-controller-77475f899c-z5lhw-argocd-applicationset-controller.log](https://github.com/redhat-developer/gitops-operator/files/8858898/openshift-gitops-applicationset-controller-77475f899c-z5lhw-argocd-applicationset-controller.log)

time="2022-06-08T05:49:25Z" level=info msg=Trace args="[git fetch origin master --tags --force]" dir=/tmp/[email protected]_7999_ao_cluster-dev operation_name="exec git" time_ms=2.725453
time="2022-06-08T05:49:25Z" level=error msg="error generating params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 128: error: cannot fork() for ssh -i /dev/shm/3431765484 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null: Resource temporarily unavailable\nfatal: unable to fork" generator="&{0xc000a3aeb0}"
time="2022-06-08T05:49:25Z" level=error msg="error generating application from params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 128: error: cannot fork() for ssh -i /dev/shm/3431765484 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null: Resource temporarily unavailable\nfatal: unable to fork" generator="{<nil> <nil> 0xc001155a00 <nil> <nil> <nil> <nil> <nil>}"

fr-finnova avatar Jun 08 '22 06:06 fr-finnova

The issue is that for the fix, the applicationset controller should use entrypoint.sh as command: https://github.com/argoproj/applicationset/pull/453/files#diff-11178f5b2367762f497dbdb1d825a5061c49e20b5c0176ea79c39a9502585225R20

But this isn't the case with 1.5.2:

$  oc get Subscription.operators.coreos.com  -n openshift-operators openshift-gitops-operator -o yaml | grep current
  currentCSV: openshift-gitops-operator.v1.5.2

$ oc get pod -n openshift-gitops -l app.kubernetes.io/name=openshift-gitops-applicationset-controller -o json | jq .items[].spec.containers[].command
[
  "applicationset-controller",
  "--argocd-repo-server",
  "openshift-gitops-repo-server.openshift-gitops.svc.cluster.local:8081",
  "--loglevel",
  "info"
]

duritong avatar Jun 15 '22 11:06 duritong

@iam-veeramalla : did we planned to add tini as a process reaper by specifying in command field of Deployment?

rishabh625 avatar Jun 15 '22 12:06 rishabh625

@iam-veeramalla we are still seeing this issue. Is it possible to add the upstream fix posted by @duritong?

caseyscarborough avatar Aug 08 '22 15:08 caseyscarborough

We found this solution on the Red Hat knowledge base. It says this issue is fixed in version 1.6 of the operator, which was released a few days ago. We will update our GitOps Operator and check if it resolves the issue.

caseyscarborough avatar Aug 08 '22 17:08 caseyscarborough