argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

fix: Apply execution control to taskset nodes that are not part of exit handler and only delete agent pod if exists.

Open jswxstw opened this issue 9 months ago • 1 comments

Fixes #12659 Fixes #13060

Motivation

  1. This PR https://github.com/argoproj/argo-workflows/pull/12441 has support plugin nodes shutdown, but HTTP node still not.
  2. Fixes #12659
  3. Fixes #13060

Modifications

Modification 1

Execution control has been applied to the nodes with created pods after pod reconciliation. However, pending and suspended nodes do not have created pods, and taskset nodes use the agent pod. Mark these nodes failed when shutting down or exceeding deadline since pod reconciliation does not take effect on them.

Modification 2

Agent pod will only be created when taskset nodes exist, so do not delete it if it does not exist.

Modification 3

Skip execution control for exit nodes.

Verification

jswxstw avatar May 07 '24 13:05 jswxstw

:shipit:

tooptoop4 avatar Jun 13 '24 22:06 tooptoop4

make some comments in pr(https://github.com/argoproj/argo-workflows/pull/13185) about IsExitNode function Or maybe need to be clear about the definition of an ExitNode

  1. The root node of the exit handle
  2. Any one of all the nodes involved in the exit handle

JasonChen86899 avatar Jun 24 '24 08:06 JasonChen86899

Or maybe need to be clear about the definition of an ExitNode

I think we can do it this way: keep the existing logic of IsExitNode and add a new function IsPartOfExitHandler to determine if the node is in the Exit Handler.

jswxstw avatar Jun 24 '24 09:06 jswxstw

// IsPartOfExitHandler returns whether node is part of exit handler.
func (woc *wfOperationCtx) IsPartOfExitHandler(node wfv1.NodeStatus) bool {
	if node.BoundaryID == "" {
		return node.IsExitNode()
	}
	if boundaryNode, err := woc.wf.Status.Nodes.Get(node.BoundaryID); err == nil {
		return boundaryNode.IsExitNode()
	}
	return false
}

A for loop or recursion is needed here. Here's a modification of my code based on yours

// IsPartOfExitHandler returns whether node is part of exit handler.
func (woc *wfOperationCtx) IsPartOfExitHandler(node wfv1.NodeStatus) (bool, error) {
	tmpNode := node
	for !tmpNode.IsExitNode() {
		if tmpNode.BoundaryID == "" {
			return false, nil
		}
		// get boundary node
		boundaryNode, err := woc.wf.Status.Nodes.Get(tmpNode.BoundaryID)
		if err != nil {
			return false, err
		}

		tmpNode = *boundaryNode
	}
	
	return true, nil
}

JasonChen86899 avatar Jun 25 '24 03:06 JasonChen86899

Modification 2

Agent pod will only be created when taskset nodes exist, so do not delete it if it does not exist.

@jswxstw can you split the agent delete into a separate PR? it's independent of the other changes here and supersedes #12679, which we both already reviewed, so that one I can review and merge almost immediately. (It's also a lot simpler/has a lot less edge cases to review than the rest) Can assign me on that split PR

agilgur5 avatar Jul 02 '24 06:07 agilgur5

New PR: #13294 @agilgur5, I don't have permission to assign.

jswxstw avatar Jul 02 '24 06:07 jswxstw

New PR: https://github.com/argoproj/argo-workflows/pull/13294

Thanks! Approved. Should merge once tests succeed.

I don't have permission to assign.

Ah right. For future reference, I think you can "Request a Review" which will at least send me a notification

agilgur5 avatar Jul 02 '24 06:07 agilgur5