argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Image pull error: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "mynamespace"

Open vitalyrychkov opened this issue 1 year ago • 6 comments

Pre-requisites

  • [X] I have double-checked my configuration
  • [X] I can confirm the issues exists when I tested with :latest
  • [ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

We are storing container images of our application in a private image registry. We are deploying Argo using Helm. It seems that the workflow server in the v3.4 tries to read the container image manifest (to lookup the cmd/args) using the "argo-helm-argo-workflows-workflow-controller" service account from the argo namespace. Reading the manifest requires registry access credentials in case of a private image registry and we provide the secret with credentials in deployments: imagePullSecrets: - name: registry-credentials

When we submit a workflow the workflow controller's service account fails to read the registry access credentials from the secret located in the namespace of the application:

Image pull error: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "mynamespace"

Earlier, we have tested one of the latest 3.3.9 builds and it could pull and read the image successfully, see the issue https://github.com/argoproj/argo-workflows/issues/9139

We are using argo service account in the application's namespace to submit workflows (--serviceaccount option) which can read the secret in the same namespace. Would it be possible to use this service account to pull the image manifest? Otherwise the user "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" must be able to read secrets in all namespaces where an application is deployed?

Please explain how to use images from a private registry with access credentials in the v.3.4.0.

Version

3.4.0

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

[The issue seems to be specific to accessing credentials for private registries from the secret in the application's namespace.]

Logs from the workflow controller

time="2022-09-20T12:46:03.478Z" level=info msg="Processing workflow" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.649Z" level=info msg="Updated phase -> Running" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.649Z" level=info msg="DAG node app-adhoc-ac-db-version-1663677914 initialized Running" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.649Z" level=info msg="All of node app-adhoc-ac-db-version-1663677914.db-version dependencies [] completed" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.656Z" level=info msg="DAG node app-adhoc-ac-db-version-1663677914-749901051 initialized Running" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.656Z" level=info msg="All of node app-adhoc-ac-db-version-1663677914.db-version.db-version-task dependencies [] completed" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.666Z" level=info msg="Pod node app-adhoc-ac-db-version-1663677914-3159602526 initialized Pending" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.674Z" level=error msg="Mark error node" error="failed to look-up entrypoint/cmd for image "myregistry.cloud/releases/myapp:myimage", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets "app-registry-creds" is forbidden: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "san-app-test"" namespace=san-app-test nodeName=app-adhoc-ac-db-version-1663677914.db-version.db-version-task workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 phase Pending -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 message: failed to look-up entrypoint/cmd for image "myregistry.cloud/releases/myapp:myimage", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets "app-registry-creds" is forbidden: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "san-app-test"" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 finished: 2022-09-20 12:46:03.674633014 +0000 UTC" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.674Z" level=error msg="Mark error node" error="task 'app-adhoc-ac-db-version-1663677914.db-version.db-version-task' errored: failed to look-up entrypoint/cmd for image "myregistry.cloud/releases/myapp:myimage", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets "app-registry-creds" is forbidden: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "san-app-test"" namespace=san-app-test nodeName=app-adhoc-ac-db-version-1663677914.db-version.db-version-task workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 message: task 'app-adhoc-ac-db-version-1663677914.db-version.db-version-task' errored: failed to look-up entrypoint/cmd for image "myregistry.cloud/releases/myapp:myimage", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets "app-registry-creds" is forbidden: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "san-app-test"" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.686Z" level=info msg="Outbound nodes of app-adhoc-ac-db-version-1663677914-749901051 set to [app-adhoc-ac-db-version-1663677914-3159602526]" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.686Z" level=info msg="node app-adhoc-ac-db-version-1663677914-749901051 phase Running -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.686Z" level=info msg="node app-adhoc-ac-db-version-1663677914-749901051 finished: 2022-09-20 12:46:03.686553147 +0000 UTC" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.686Z" level=info msg="Checking daemoned children of app-adhoc-ac-db-version-1663677914-749901051" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="Outbound nodes of app-adhoc-ac-db-version-1663677914 set to [app-adhoc-ac-db-version-1663677914-3159602526]" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="node app-adhoc-ac-db-version-1663677914 phase Running -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="node app-adhoc-ac-db-version-1663677914 finished: 2022-09-20 12:46:03.69151054 +0000 UTC" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="Checking daemoned children of app-adhoc-ac-db-version-1663677914" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="TaskSet Reconciliation" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg=reconcileAgentPod namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="Updated phase Running -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="Marking workflow completed" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="Marking workflow as pending archiving" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.691Z" level=info msg="Checking daemoned children of " namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.696Z" level=info msg="cleaning up pod" action=deletePod key=san-app-test/app-adhoc-ac-db-version-1663677914-1340600742-agent/deletePod time="2022-09-20T12:46:03.704Z" level=info msg="Workflow update successful" namespace=san-app-test phase=Error resourceVersion=100074719 workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.707Z" level=info msg="archiving workflow" namespace=san-app-test uid=e25bc895-59ec-46ae-8e39-4dea893eb0f7 workflow=app-adhoc-ac-db-version-1663677914 time="2022-09-20T12:46:03.727Z" level=info msg="Queueing Error workflow san-app-test/app-adhoc-ac-db-version-1663677914 for delete in 5m0s due to TTL" time="2022-09-20T12:51:04.000Z" level=info msg="Deleting garbage collected workflow 'san-app-test/app-adhoc-ac-db-version-1663677914'" time="2022-09-20T12:51:04.014Z" level=info msg="Successfully deleted 'san-app-test/app-adhoc-ac-db-version-1663677914'"

Logs from in your workflow's wait container

[no output, as the workflow could not be submitted due to manifest pull error]

vitalyrychkov avatar Sep 20 '22 13:09 vitalyrychkov

@vitalyrychkov can you provide your k8s version?

sarabala1979 avatar Sep 22 '22 16:09 sarabala1979

There is PR for supporting v1.24 service account secret change. #9620

sarabala1979 avatar Sep 22 '22 16:09 sarabala1979

@sarabala1979 K8s cluster version: 1.23.2 kubectl client version: 1.23

vitalyrychkov avatar Sep 23 '22 14:09 vitalyrychkov

@terrytangyuan will work on this.

sarabala1979 avatar Sep 23 '22 23:09 sarabala1979

Hi @sarabala1979 and @terrytangyuan

We have tried to use a private image registry with anonymous pull enabled.

We use the same image to start a pod (service) and to submit a task in Argo. The service account of the workflow-controller was given RBAC permissions to read the secret defined in the "imagePullSecrets" parameter of our deployments.

We have tested the following scenarios:

  • Password protected access only. The imagePullSecret exists in our namespace. Our pod starts fine using the registry credentials from the secret. Submitted task starts fine using the registry credentials from the secret.

  • Anonymous access enabled. The imagePullSecret does not exist. Our pod starts fine without using registry credentials. Submitted task fails to lookup entrypoint/cmd with the error message "secrets not found".

  • Anonymous access enabled. The imagePullSecret exists in our namespace. Our pod starts fine. Submitted task starts fine using the registry credentials from the secret.

Seems that if the imagePullSecret is specified in the deployment, the workflow-controller always tries to authenticate instead of anonymous pull? Would it be possible to try first the anonymous and then password-protected pull or to add a parameter to switch between them? Shall we discuss this issue here or open a separate issue?

vitalyrychkov avatar Oct 05 '22 12:10 vitalyrychkov

Hi @sarabala1979 and @terrytangyuan

We have tried to use a private image registry with anonymous pull enabled.

We use the same image to start a pod (service) and to submit a task in Argo. The service account of the workflow-controller was given RBAC permissions to read the secret defined in the "imagePullSecrets" parameter of our deployments.

We have tested the following scenarios:

  • Password protected access only. The imagePullSecret exists in our namespace. Our pod starts fine using the registry credentials from the secret. Submitted task starts fine using the registry credentials from the secret.
  • Anonymous access enabled. The imagePullSecret does not exist. Our pod starts fine without using registry credentials. Submitted task fails to lookup entrypoint/cmd with the error message "secrets not found".
  • Anonymous access enabled. The imagePullSecret exists in our namespace. Our pod starts fine. Submitted task starts fine using the registry credentials from the secret.

Seems that if the imagePullSecret is specified in the deployment, the workflow-controller always tries to authenticate instead of anonymous pull? Would it be possible to try first the anonymous and then password-protected pull or to add a parameter to switch between them? Shall we discuss this issue here or open a separate issue?

Created a separate issue for this: https://github.com/argoproj/argo-workflows/issues/9802

vitalyrychkov avatar Oct 12 '22 14:10 vitalyrychkov