argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Add a `pendingTimeout` parameter

Open hadim opened this issue 2 years ago • 1 comments

Summary

See https://github.com/argoproj/argo-workflows/issues/3572 for context.

Some of our workflows fails to schedule a k8s node because sometimes there are errors in the configuration that is responsible to execute a workflow.

The currently available option activeDeadlineSeconds considers both the pending phase and also the execution phase. We would need an option that only consider the pending phase so our failing pending workflow would be marked as failed after xxx seconds.

This new option could be pendingTimeoutSeconds or pendintDeadlineSeconds.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

hadim avatar Jan 11 '23 00:01 hadim

I investigated this issue and when I looked into a possible solution, I ran into one that actually already exists: #3686

The template timeout field as currently documented (https://argo-workflows.readthedocs.io/en/latest/fields/#template) sounds like a duplicate of the activeDeadlineSeconds field, but as actually implemented the node StartedAt time is when the workflow node was created, and timeout is only considered for nodes in the NodePending phase, thus making timeout more like pendingTimeout in practice.

I have verified that specifying timeout: 600s in my templates does indeed prevent them from spending more than 600s in Pending state, while allowing them to run for however long they need to.

Perhaps some improvement to the documentation is in order?

drawlerr avatar Jan 22 '24 17:01 drawlerr

Update: while the timeout parameter does seem to catch pods that have been pending too long, it has some issues and consequences:

  • The template timeout is only evaluated "incidentally" and is not guaranteed to be evaluated near to expiration time, so it's it's more of a "minimum" than a "maximum" parameter
  • Template timeout is transferred to activeDeadlineSeconds if that param is unset or greater than template deadline. So, timeout is not just applicable to the pending state but rather a full end-to-end timeout I am investigating other options for resolution.

drawlerr avatar Feb 29 '24 15:02 drawlerr