argo-workflows
argo-workflows copied to clipboard
controller sending many pod delete requests that result in 404 response
Pre-requisites
- [X] I have double-checked my configuration
- [X] I can confirm the issue exists when I tested with
:latest
- [X] I have searched existing issues and could not find a match for this bug
- [ ] I'd like to contribute the fix myself (see contributing guide)
What happened/what did you expect to happen?
although this seems to have no effect on the functioning of argoworkflows it could be potential stability/performance/k8s log cost issue at scale. it must be also filling up the cleanup queue so delaying cleanup of pods that really do exist
from checking the k8s api server the wfcontroller seems to be sending delete pod request for <podname from a step>-agent
and getting not found response. i am using standard workflows like whalesay example. not sure what significance of agent suffix is (i did see https://github.com/argoproj/argo-workflows/blob/66680f1c9bca8b47c40ce918b5d16714058647cb/workflow/controller/agent.go#L25)
seeing 1000s of these, seems for every pod run its sending this unrequired delete request for pod with -agent suffix?
Version
3.4.11
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
n/a
Logs from the workflow controller
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
n/a
Logs from in your workflow's wait container
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
n/a
For reference, this was previously noted in Slack
not sure what significance of agent suffix is
As far as I understand, the "agent" is a piece of the Executor that runs for certain built-in, non-container
template types (e.g. resource
, http
, data
, script
, etc). Anyone else please correct me if I'm wrong; the historical references to agent didn't quite have a standard definition.
seeing 1000s of these, seems for every pod run its sending this unrequired delete request for pod with -agent suffix?
That sounds like it might be accidentally assuming that each Pod has an agent, when there are only certain types that do 🤔
https://github.com/argoproj/argo-workflows/blob/66680f1c9bca8b47c40ce918b5d16714058647cb/workflow/controller/operator.go#L2369 seems to be the line assuming each pod has an agent.
The agent pod will only be created if taskSet
is not empty. Each workflow can have at most one agent pod.
https://github.com/argoproj/argo-workflows/blob/5c8062e55d975aab117e37c7592c0a648a9e9860/workflow/controller/agent.go#L32-L46
Only http and plugin template will be put into taskSet
right now.
@jswxstw do u want to create PR?
@jswxstw do u want to create PR?
I see you have created a PR and it looks good to me basically.
The only problem is that you can use the existing function woc.hasTaskSetNodes()
to determine whether the deletion of AgentPod is necessary, rather than create a new function woc.hasNodeWithAgentPod()
.