gitops-operator
gitops-operator copied to clipboard
ApplicationSet Controller Can't Clone Repositories Due to fork Error
Description
After some period of time, the ApplicationSet controller becomes unable to create new Applications based on a repository. This is what appears in the logs:
time="2022-03-22T19:53:56Z" level=info msg="git fetch origin master --tags --force" dir=/tmp/https___github.com_my-org_my-repo-ocp-config execID=bUpzx
time="2022-03-22T19:53:56Z" level=error msg="`git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable" execID=bUpzx
time="2022-03-22T19:53:56Z" level=info msg=Trace args="[git fetch origin master --tags --force]" dir=/tmp/https___github.com_my-org_my-repo-ocp-config operation_name="exec git" time_ms=2.7651749999999997
time="2022-03-22T19:53:56Z" level=error msg="error generating params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable" generator="&{0xc000a1abc0}"
time="2022-03-22T19:53:56Z" level=error msg="error generating application from params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable" generator="{<nil> <nil> 0xc000cdd380 <nil> <nil> <nil>}"
2022-03-22T19:53:56.156Z ERROR controller-runtime.manager.controller.applicationset Reconciler error {"reconciler group": "argoproj.io", "reconciler kind": "ApplicationSet", "name": "my-repo-uat-apps", "namespace": "openshift-gitops", "error": "Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable", "errorVerbose": "`git fetch origin master --tags --force` failed exit status 255: error: cannot fork() for git-remote-https: Resource temporarily unavailable\nError during fetching repo\ngithub.com/argoproj-labs/applicationset/pkg/services.checkoutRepo\n\t/remote-source/app/pkg/services/repo_service.go:156\ngithub.com/argoproj-labs/applicationset/pkg/services.(*argoCDService).GetDirectories\n\t/remote-source/app/pkg/services/repo_service.go:81\ngithub.com/argoproj-labs/applicationset/pkg/generators.(*GitGenerator).generateParamsForGitDirectories\n\t/remote-source/app/pkg/generators/git.go:74\ngithu...
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214
If you delete the openshift-gitops-applicationset-controller
pod, everything begins to work properly again.
It seems to be this error specifically:
Error during fetching repo: `git fetch origin master --tags --force` failed exit status 255:
error: cannot fork() for git-remote-https: Resource temporarily unavailable"
If you open a shell into the applicationset controller pod, no commands will run and each one returns an error similar to the following:

Steps to Reproduce
Unfortunately I don't have exactly steps to reproduce as it appears to be non-deterministic, I don't know about it until Applications stop appearing when new folders are creating in configuration repositories (we are using the git generator in our ApplicationSets).
Additional context
We are running the latest version (1.4.3) of the Openshift Gitops Operator using Openshift Container Platform version 4.7.
Our workaround is deleting the pod, and then it works again temporarily.
This will be fixed with the update to ApplicationSet v0.4.z in GitOps release 1.5.
The current shipped version (0.2) does not have an init-style process reaper in the container, so zombie processes may stack up under certain circumstances.
Great news, thanks for the info and I'll keep an eye out for the 1.5 update.
Hi @caseyscarborough , GitOps v1.5.0 is out. This issue should be fixed now. Can you please confirm if you are unblocked ?
Hi @iam-veeramalla, looks like we're still on Openshift 4.7 so the upgrade isn't available to us. Do you know if there are plans to add it to 4.7 or will we need to upgrade to 4.8+ to update the GitOps operator to 1.5.0?
Hi @iam-veeramalla we are facing the same problem with OpenShift 4.10.13, GitOps 1.5.1 and ApplicationSet controller v0.4.1. So unfortunately the update to the new GitOps Operator did not solve the bug. Here's an extract of the attached ApplicationSet controller logfile:
...
[openshift-gitops-applicationset-controller-77475f899c-z5lhw-argocd-applicationset-controller.log](https://github.com/redhat-developer/gitops-operator/files/8858898/openshift-gitops-applicationset-controller-77475f899c-z5lhw-argocd-applicationset-controller.log)
time="2022-06-08T05:49:25Z" level=info msg=Trace args="[git fetch origin master --tags --force]" dir=/tmp/[email protected]_7999_ao_cluster-dev operation_name="exec git" time_ms=2.725453
time="2022-06-08T05:49:25Z" level=error msg="error generating params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 128: error: cannot fork() for ssh -i /dev/shm/3431765484 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null: Resource temporarily unavailable\nfatal: unable to fork" generator="&{0xc000a3aeb0}"
time="2022-06-08T05:49:25Z" level=error msg="error generating application from params" error="Error during fetching repo: `git fetch origin master --tags --force` failed exit status 128: error: cannot fork() for ssh -i /dev/shm/3431765484 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null: Resource temporarily unavailable\nfatal: unable to fork" generator="{<nil> <nil> 0xc001155a00 <nil> <nil> <nil> <nil> <nil>}"
The issue is that for the fix, the applicationset controller should use entrypoint.sh as command: https://github.com/argoproj/applicationset/pull/453/files#diff-11178f5b2367762f497dbdb1d825a5061c49e20b5c0176ea79c39a9502585225R20
But this isn't the case with 1.5.2:
$ oc get Subscription.operators.coreos.com -n openshift-operators openshift-gitops-operator -o yaml | grep current
currentCSV: openshift-gitops-operator.v1.5.2
$ oc get pod -n openshift-gitops -l app.kubernetes.io/name=openshift-gitops-applicationset-controller -o json | jq .items[].spec.containers[].command
[
"applicationset-controller",
"--argocd-repo-server",
"openshift-gitops-repo-server.openshift-gitops.svc.cluster.local:8081",
"--loglevel",
"info"
]
@iam-veeramalla : did we planned to add tini as a process reaper by specifying in command field of Deployment?
@iam-veeramalla we are still seeing this issue. Is it possible to add the upstream fix posted by @duritong?
We found this solution on the Red Hat knowledge base. It says this issue is fixed in version 1.6 of the operator, which was released a few days ago. We will update our GitOps Operator and check if it resolves the issue.