argo-workflows
argo-workflows copied to clipboard
Inline templates still do not create the pod name correctly
Pre-requisites
- [X] I have double-checked my configuration
- [X] I can confirm the issue exists when I tested with
:latest
- [X] I have searched existing issues and could not find a match for this bug
- [ ] I'd like to contribute the fix myself (see contributing guide)
What happened/what did you expect to happen?
It still looks like the bug from https://github.com/argoproj/argo-workflows/issues/10912 only that now with version v3.5.5 argo is not even realizing anymore that the pod is already completed.
The pod argo wants to refer to is named fantastic-python-2118328171
as can be seen in the UI. (for the workflow, see the example workflow further down)
It's stuck in pending mode as the pod is not found on the Kubernetes cluster, because when I look at the cluster, I see the pod called fantastic-python--2118328171
which is either running or already done. Since argo apparently uses the wrong name to fetch the pod and its status, the workflow does not proceed. Even if it would continue, the logs would not be visible in the argo UI, see the referenced bug report for it.
You can even see in the logs of the controller that it created the pod using the double -
and then it tries to pull the status of the pod using only one -
. (see logs further, the first log is being returned when I search for the pod name with double -
, the other logs, when I search for the pod name with only one -
)
If I got your fix right for https://github.com/argoproj/argo-workflows/issues/10912 I think you tried to fix a problem with the naming of the task, but I think the issue goes deeper:
As far as I understood it, the pod name should always be something like {workflow-name}-{workflow-id}-{step-name}-{step-id}
but when I use the inline
option rather than the template
option to reference another template, it simply does not add the step-name
to the pod name.
Please have a look at it, since I need to use the inline
option as I'm creating the workflows automatically, and using the template
option would make everything much much harder.
Version
v3.5.5
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
metadata:
name: fantastic-python
namespace: default
spec:
entrypoint: argosay
templates:
- name: argosay
dag:
tasks:
- name: some-task
inline:
container:
name: main
image: argoproj/argosay:v2
command:
- /argosay
args:
- echo
- 'Hello argo'
Logs from the workflow controller
time="2024-04-15T12:44:16.689Z" level=info msg="Created pod: fantastic-python.some-task (fantastic-python--2118328171)" namespace=default workflow=fantastic-python
time="2024-04-15T12:44:16.678Z" level=warning msg="was unable to obtain the node for fantastic-python-2118328171, taskName some-task"
time="2024-04-15T12:44:16.678Z" level=warning msg="was unable to obtain the node for fantastic-python-2118328171, taskName some-task"
time="2024-04-15T12:44:16.678Z" level=info msg="Pod node fantastic-python-2118328171 initialized Pending" namespace=default workflow=fantastic-python
time="2024-04-15T12:44:26.691Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:44:36.695Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:44:46.711Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:44:56.714Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:45:06.717Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
time="2024-04-15T12:45:16.722Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=fantastic-python-2118328171 old.message= old.phase=Pending old.progress=0/1 workflow=fantastic-python
Logs from in your workflow's wait container
time="2024-04-15T12:59:16.660Z" level=info msg="Starting Workflow Executor" version=v3.4.8
time="2024-04-15T12:59:16.661Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-04-15T12:59:16.661Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=default podName=fantastic-python--2118328171 template="{\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"main\",\"image\":\"argoproj/argosay:v2\",\"command\":[\"/argosay\"],\"args\":[\"echo\",\"Hello argo\"],\"resources\":{}}}" version="&Version{Version:v3.4.8,BuildDate:2023-05-25T22:21:53Z,GitCommit:9e27baee4b3be78bb662ffa5e3a06f8a6c28fb53,GitTag:v3.4.8,GitTreeState:clean,GoVersion:go1.20.4,Compiler:gc,Platform:linux/arm64,}"
time="2024-04-15T12:59:16.661Z" level=info msg="Starting deadline monitor"
time="2024-04-15T12:59:18.663Z" level=info msg="Main container completed" error="<nil>"
time="2024-04-15T12:59:18.663Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-04-15T12:59:18.663Z" level=info msg="No output parameters"
time="2024-04-15T12:59:18.663Z" level=info msg="No output artifacts"
time="2024-04-15T12:59:18.663Z" level=info msg="Alloc=9863 TotalAlloc=15937 Sys=24429 NumGC=4 Goroutines=7"
looks the same reason with https://github.com/argoproj/argo-workflows/issues/12895, and will be fixed by https://github.com/argoproj/argo-workflows/pull/12928
mmh okay looks like it yes, I was somehow only able to find 2 already closed issues on this, but never found the open one on google. Thank you for the quick response!