argo-workflows Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible.

Pre-requisites

[X] I have double-checked my configuration
[X] I can confirm the issues exists when I tested with :latest
[ ] I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

I expected an ugrade from 3.2.6 to 3.4.8 to be (mostly) "specify new images". It also required a small amount of tweaking RBAC roles.

I did not expect it to require reconfiguring every workflow (many of our workflows use custom, private, images; using the scriptfunctionality). With the upgrade primarily being motivated by "we want the fresh ssh_known_host file" (as opposed to having to use insecureIgnoreHostKey), the amount of work needed to switch from the k8sapi to the emissary executor was completely unexpected and disappointing, as for the script-type nodes, it really should be possible for Argo/emissary to figure out what the right command is (and bordering on hard for a human to do).

We do not want the Argo workflow controller to have access to our private registries (it should not need that access, the kubernetes cluster has the required image pull capabilities), but it is not obvious what the auto-generated command for a "script" will be (so the entrypoint cannot be specified neither as a command nor statically in a configuration file).

Version

v3.4.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a (this is all private images)

Logs from the workflow controller

time="2023-06-01T07:18:47.310Z" level=warning msg="Non-transient error: failed to look-up entrypoint/cmd for image \"ELIDED\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: GET https://ELIDED: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: ELIDED"

Logs from in your workflow's wait container

n/a, the pod is never created

Jun 01 '23 08:06 vatine

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

Jun 18 '23 02:06 stale[bot]

One possible solution here is that for "script" actions, the emissary executor synthesises its own "command" setting. That would probably solve 98% of our incompatibility issues, and for sure the specific thing being executed is 100% within the control of the executor, so it should be relatively straight-forward.

Sep 14 '23 06:09 vatine

We do not want the Argo workflow controller to have access to our private registries (it should not need that access, the kubernetes cluster has the required image pull capabilities)

From #8345, this line should be using your existing imagePullSecrets

as for the script-type nodes, it really should be possible for Argo/emissary to figure out what the right command is (and bordering on hard for a human to do).

One possible solution here is that for "script" actions, the emissary executor synthesises its own "command" setting

To clarify, this issue is entirely with script templates, right?

Based on your analysis in https://github.com/argoproj/argo-workflows/pull/12787#issuecomment-1993740179, it seemed lke your suggested solution could be possible; we might be able to skip the lookup for script templates.

Upon further inspection though, it does seem to append the Docker ENTRYPOINT to the Command if it exists, and appends Args if they exist as well. You can actually specify both in a script template, which the docs mention as well.

So I'm not so sure this is a bug with emissary; you could specify a Command for a script template, but it sounds like in your case you don't, which is why it does an entrypoint lookup. My guess is that in your case you might be able to workaround this with bash or exec as your command in nearly all cases.

Apr 19 '24 14:04 agilgur5

I guess it's more a "missing feature" than a bug. Even so, with your typical script action, it seems as if the source gets dropped into a shell script and that ends up being set as the entrypoint (further experimentation lead to https://github.com/argoproj/argo-workflows/pull/12787 for fixing an issue with script permissions, once a static configuration has been made, the in-house build images have now been stable for long enough that I felt it was OK to do that).

Apr 19 '24 15:04 vatine

it seems as if the source gets dropped into a shell script and that ends up being set as the entrypoint

it's in args actually, as your output in https://github.com/argoproj/argo-workflows/pull/12787#issuecomment-1993740179 showed. And the source code for that is this line that I mentioned above.

As I wrote above though, the image's ENTRYPOINT and CMD get appended (when command and args are not specified) to Emissary's execution. Effectively, Emissary is a parent process that runs your (or your image's) commands as a subprocess (I only did a deep dive into the Executor in the past month or so to understand that well enough).

As such, It needs to know the image's ENTRYPOINT in order to be able to append it.

Emissary is a bit hacky, but it's the least hacky and most secure of the executors so far, as I understand it.

Apr 19 '24 15:04 agilgur5

argo-workflows argo-workflows copied to clipboard

Dropping of k8sapi executor makes upgrade from 3.2.6 to 3.4.8 not feasible.

Pre-requisites

What happened/what you expected to happen?

Version

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

Logs from the workflow controller

Logs from in your workflow's wait container

argo-workflows
argo-workflows copied to clipboard