argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

argo-repo-server issue: gpg ... --gen-key failed exit status 2

Open nice-pink opened this issue 2 years ago • 24 comments

Checklist:

  • [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [x] I've included steps to reproduce the bug.
  • [x] I've pasted the output of argocd version.

Describe the bug

After upgrading argo-cd from version v2.3.5 to v.2.4.3 the argo-repo-server stopped working with the logs:

argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"                                                     │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"                                                               │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403" dir= execID=f1898       │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403]" dir= opera │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" │

This leads to Argo CD UI showing error:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.3.43.220:8081: connect: connection refused"

To Reproduce

For me it was just the upgrade.

Expected behavior

argo-repo-server starts up without errors.

Screenshots

Version

argocd: v2.1.3+d855831.dirty
  BuildDate: 2021-09-30T22:11:24Z
  GitCommit: d855831540e51d8a90b1006d2eb9f49ab1b088af
  GitTreeState: dirty
  GoVersion: go1.17.1
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.4.3+471685f

nice-pink avatar Jun 28 '22 17:06 nice-pink

Is it the same machine before and after the upgrade and does it have git access?

Hanyu96 avatar Jul 01 '22 05:07 Hanyu96

It is the same machine, same setup. Just upgraded from 2.3.X to 2.4.X. I'm facing the same issue with all 2.4.X versions. After the downgrade to 2.3.5 repo server works as expected.

nice-pink avatar Jul 01 '22 06:07 nice-pink

had the same issue when trying to upgrade from 2.2.10. switched back to latest 2.3.5 as well.

florianzimm avatar Jul 07 '22 08:07 florianzimm

I do have the same issue here. Are you also having istio activated?

sass1997 avatar Jul 08 '22 09:07 sass1997

No, I'm not using istio.

nice-pink avatar Jul 08 '22 09:07 nice-pink

As far as I've read the documentation:

As a security enhancement, the argocd-repo-server Deployment uses its own Service Account instead of default.

If you have a custom environment that might depend on repo-server using the default Service Account (such as a plugin that uses the Service Account for auth), be sure to test before deploying the 2.4 upgrade to production.

Has anybody noticed that this service account needs an addtional role and rolebinding. For me it seems that this new service account has to less rights.

Due to the fact that there is no seperate Role and Rolebinding. It takes the setting of your cluster which you gave or the default is system:serviceaccounts or system:authenticated.

sass1997 avatar Jul 08 '22 09:07 sass1997

This might not be an issue in our setup, but thanks for the hint. I'll check that.

nice-pink avatar Jul 08 '22 09:07 nice-pink

I will try to debug and let you know with the correct Rolebinding which is needed to start this pod

sass1997 avatar Jul 08 '22 10:07 sass1997

I can confirm it's definitely missing special rights for this service account. I granted now everything for this service account and now the argocd-repo-server is starting. I'm now digging into that we have a least privilege role.

sass1997 avatar Jul 08 '22 10:07 sass1997

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: argocd
  name: argocd-repo-server
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argocd-repo-server
  namespace: argocd
subjects:
- kind: ServiceAccount
  name: argocd-repo-server
  apiGroup: ""
roleRef:
  kind: Role
  name: argocd-repo-server
  apiGroup: ""

Wasn't able to reduce the api groups or resources yet. But this config is working. if someone from the maintainers can help here would be very useful.

sass1997 avatar Jul 11 '22 08:07 sass1997

@sass1997 the permissionless service account is working for me in both my local docker-desktop setup and in Intuit Argo CD instances. I'm not able to reproduce the bug.

crenshaw-dev avatar Jul 11 '22 15:07 crenshaw-dev

I have just rebuilt a k8s cluster and install argocd using kustomize And I am having the exact same issue using version 2.4.3 and 2.4.4.

Defaulted container "argocd-repo-server" out of: argocd-repo-server, copyutil (init)
time="2022-07-11T16:23:47Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"
time="2022-07-11T16:23:47Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2022-07-11T16:23:47Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667" dir= execID=861a3
time="2022-07-11T16:23:53Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667` failed exit status 2" execID=861a3
time="2022-07-11T16:23:53Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667]" dir= operation_name="exec gpg" time_ms=6009.55569
time="2022-07-11T16:23:53Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667` failed exit status 2"

PierreRAFFA avatar Jul 11 '22 16:07 PierreRAFFA

I have just tested to install version 2.3.5 and argocd-repo-server works

PierreRAFFA avatar Jul 11 '22 16:07 PierreRAFFA

@sass1997 Not even adding the role and role binding solves the issue for me. Still get exactly the same error. Hm.

nice-pink avatar Jul 26 '22 09:07 nice-pink

I am seeing the same issue too, but also only in the production cluster not on my local minikube. Adjusting the roles also does not change anything.

Setting the log level to debug reveals the error:

"gpg: can't connect to the agent: End of file\ngpg: agent_genkey failed: No agent running\ngpg: key generation failed: No agent running\n"

@nice-pink is that the same for you ?

ghost avatar Aug 02 '22 09:08 ghost

I'm seeing the original error still.

argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"                                                     │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"                                                               │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403" dir= execID=f1898       │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403]" dir= opera │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" 

The whole argo setup is running in a managed OVH cluster. Haven't tried in minikube or any similar cluster so far.

nice-pink avatar Aug 02 '22 09:08 nice-pink

Mhh, might be two unrelated issues then, I will keep you posted if I find out more

ghost avatar Aug 02 '22 10:08 ghost

For us it was pod security policies that needed updating- try this for a quick test giving the namespace priviledged access

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argocd-psp
rules:
- apiGroups:
  - policy
  resourceNames:
  - privileged
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argocd-psp
subjects:
- kind: Group
  name: system:serviceaccounts:argocd
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: argocd-psp
  apiGroup: rbac.authorization.k8s.io

cesarmesones avatar Aug 05 '22 22:08 cesarmesones

Hm, also no change for us. So far only disabling gpg as described in https://argo-cd.readthedocs.io/en/stable/user-guide/gpg-verification/ removes the error.

nice-pink avatar Aug 10 '22 18:08 nice-pink

For me neither, however changing the underlying VM image of the worker nodes did resolve the issue for us. On a standard ubuntu based image everything is fine now, without changing anything else

ghost avatar Aug 10 '22 20:08 ghost

@pleic which Ubuntu version do you use for the base image?

nice-pink avatar Aug 22 '22 08:08 nice-pink

@pleic which Ubuntu version do you use for the base image?

I am using ubuntu bionic, the current cloud image

ghost avatar Aug 22 '22 10:08 ghost

Experiencing the same issue here

Hm, also no change for us. So far only disabling gpg as described in https://argo-cd.readthedocs.io/en/stable/user-guide/gpg-verification/ removes the error.

Thanks @nice-pink for this workaround 👍🏻

bauerjs1 avatar Aug 26 '22 13:08 bauerjs1

We experienced the same error after upgrading from 2.2.x to 2.4.11.

In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
spec:
  template:
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault

jabbors avatar Sep 12 '22 08:09 jabbors

We experienced the same error after upgrading from 2.2.x to 2.4.11.

In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
spec:
  template:
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault

The same issue with version v2.4.12, this workaround works for me! Thanks

shizhz avatar Sep 20 '22 09:09 shizhz

We @swisspost are also facing this issue on VMware TKGI 1.22 (= TKGI 1.13.4-build.15). Argo CD v2.4.7 (via helm chart version 4.9.16) is working fine on AWS EKS and Azure AKS but not on TKGI.

TKGI runs really old worker OS:

$ kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE     VERSION            INTERNAL-IP    EXTERNAL-IP    OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
08f...   Ready    <none>   4h20m   v1.22.6+vmware.1   172.23.129.8   172.23.129.8   Ubuntu 16.04.7 LTS   4.15.0-176-generic   docker://20.10.9

I then suspected that Ubuntu 16 worker with Ubuntu 22 base image has some compat issues (Argo 2.4 container image is based on Ubuntu 22).

Unfortunately this theory is wrong - I booted a single-node k3s cluster with Ubuntu 16 LTS, docker and k3s:

$ kubectl get nodes -o wide
NAME     STATUS   ROLES                  AGE   VERSION        INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ubuntu   Ready    control-plane,master   16m   v1.22.6+k3s1   192.168.91.205   <none>        Ubuntu 16.04.7 LTS   4.4.0-186-generic   docker://20.10.7

$ kubectl -n argocd get po
NAME                                  READY   STATUS    RESTARTS   AGE
argocd-redis-6bb9c5d89f-kh4jj         1/1     Running   0          5m57s
argocd-application-controller-0       1/1     Running   0          5m57s
argocd-repo-server-7d97f5cbdb-5tqjc   1/1     Running   0          5m56s
argocd-server-9f646bf78-qfsf4         1/1     Running   0          5m56s

$ helm ls -n argocd
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
argocd  argocd          1               2022-09-23 17:02:28.2194322 +0200 CEST  deployed        argo-cd-4.9.16  v2.4.7

Argo CD 2.4.7 is working fine here. I have no clue what else to try and filed a VMware issue in our support portal. 🙄

mkilchhofer avatar Sep 23 '22 15:09 mkilchhofer

We @swisscom have the same issue with a Kubernetes cluster based on VMware Tanzu v1.21.9+vmware.1.

  Kernel Version:             4.15.0-167-generic
  OS Image:                   Ubuntu 16.04.7 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.21.9+vmware.1
  Kube-Proxy Version:         v1.21.9+vmware.1

Whoa.

Log Line

time="2022-10-06T09:10:58Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315" dir= execID=0ca0f
time="2022-10-06T09:11:04Z" level=debug msg="gpg: can't connect to the agent: End of file\ngpg: agent_genkey failed: No agent running\ngpg: key generation failed: No agent running\n" duration=6.01130389s execID=0ca0f
time="2022-10-06T09:11:04Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315` failed exit status 2" execID=0ca0f
time="2022-10-06T09:11:04Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315]" dir= operation_name="exec gpg" time_ms=6011.9106090000005
time="2022-10-06T09:11:04Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315` failed exit status 2"

argocd-repo-server

v2.4.12

gpg

argocd@argocd-repo-server-64d5df97c5-2p6xx:~$ gpg --version
gpg (GnuPG) 2.2.27
libgcrypt 1.9.4
Copyright (C) 2021 Free Software Foundation, Inc.
License GNU GPL-3.0-or-later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /home/argocd/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

argocd@argocd-repo-server-64d5df97c5-2p6xx:~$ ldd $(which gpg)
	linux-vdso.so.1 (0x00007ffe69139000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f1a538c1000)
	libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f1a538ae000)
	libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f1a53761000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f1a53623000)
	libreadline.so.8 => /lib/x86_64-linux-gnu/libreadline.so.8 (0x00007f1a535cf000)
	libassuan.so.0 => /lib/x86_64-linux-gnu/libassuan.so.0 (0x00007f1a535b9000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f1a53591000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1a53369000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1a53282000)
	libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f1a53250000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1a539e4000)

denysvitali avatar Oct 06 '22 08:10 denysvitali

We @swisscom have the same issue with a Kubernetes cluster based on VMware Tanzu v1.21.9+vmware.1.

  Kernel Version:             4.15.0-167-generic
  OS Image:                   Ubuntu 16.04.7 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.21.9+vmware.1
  Kube-Proxy Version:         v1.21.9+vmware.1

Whoa.

Nice! Thanks for confirming that❤️. We postponed the upgrade intent for Argo on Tanzu and wait for TKGI 1.16 in Q1 2023 🙄

mkilchhofer avatar Oct 06 '22 08:10 mkilchhofer

I managed to get everything running. But I did a complete fresh setup using the install.yaml from v2.4.13. There were quite some changes so it's not easy to say which diff was the important one.

nice-pink avatar Oct 06 '22 09:10 nice-pink

Seems to be related to https://dev.gnupg.org/T2203 somehow:

$ export GNUPGHOME=/app/config/gpg/keys
$ gpgconf --launch gpg-agent
gpgconf: error running '/usr/bin/gpg-connect-agent': exit status 1
gpgconf: error running '/usr/bin/gpg-connect-agent NOP': General error

$ gpg-connect-agent -v
gpg-connect-agent: no running gpg-agent - starting '/usr/bin/gpg-agent'
gpg-connect-agent: waiting for the agent to come up ... (5s)
gpg-connect-agent: waiting for the agent to come up ... (4s)
gpg-connect-agent: waiting for the agent to come up ... (3s)
gpg-connect-agent: waiting for the agent to come up ... (2s)
gpg-connect-agent: waiting for the agent to come up ... (1s)
gpg-connect-agent: can't connect to the agent: IPC connect call failed
gpg-connect-agent: error sending standard options: No agent running

$ # time_ms=6011.9106090000005 =~ 5s waiting for the gpg agent
$  gpg-agent -v --daemon
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent'
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent.extra'
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent.browser'
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent.ssh'
$ gpg-agent[47]: gpg-agent (GnuPG) 2.2.27 started

$ ldd $(which gpg-agent)
	linux-vdso.so.1 (0x00007fff349fe000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f980ea78000)
	libassuan.so.0 => /lib/x86_64-linux-gnu/libassuan.so.0 (0x00007f980ea62000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f980ea3c000)
	libnpth.so.0 => /lib/x86_64-linux-gnu/libnpth.so.0 (0x00007f980ea35000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f980e80d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f980ec0a000)

$ dpkg-query -l gpg libgcrypt20 gpg-agent
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version           Architecture Description
+++-=================-=================-============-=====================================================
ii  gpg               2.2.27-3ubuntu2.1 amd64        GNU Privacy Guard -- minimalist public key operations
ii  gpg-agent         2.2.27-3ubuntu2.1 amd64        GNU privacy guard - cryptographic agent
ii  libgcrypt20:amd64 1.9.4-3ubuntu3    amd64        LGPL Crypto library - runtime library

denysvitali avatar Oct 06 '22 09:10 denysvitali