kfp-tekton icon indicating copy to clipboard operation
kfp-tekton copied to clipboard

Enhance error checks for invalid nested recursion

Open Tomcli opened this issue 4 years ago • 4 comments

/kind feature

Description: The Python DSL may not able to map the parameters correctly when a user defines a nested recursion. KFP Argo 1.5.0 is able to catch the incorrect parameters with the latest Argo CLI whereas Tekton CLI didn't check for it. Since Tekton custom task doesn't have task results to verify is the output parameters actually exist, we need to add our own validation functions within the Python object to catch this error.

e.g. example nested recursion pipeline:

from kfp import dsl
from kfp.components import load_component_from_text
from kfp_tekton.tekton import CEL_ConditionOp


PrintOp = load_component_from_text("""
  name: print
  inputs:
  - name: msg
  outputs:
  - name: stdout
  implementation:
    container:
      image: alpine:3.6
      command:
      - concat:
        - "echo "
        - { inputValue: msg }
""")

class CEL_Condition(dsl.Condition):
  def __init__(self, pred: str, name: str = None):
    super().__init__(CEL_ConditionOp(pred).output == 'true', name)

def CEL_ExprOp(expr: str):
  return CEL_ConditionOp(expr)


@dsl.pipeline("double-recursion test")
def double_recursion_test(until_a: int = 4, until_b: int = 3):
  @dsl.graph_component
  def recur_a(i: int, until_a: int):
    @dsl.graph_component
    def recur_b(j: int, until_b: int):
      print_op = PrintOp(f"Iter A: {i}, B: {j}")
      incr_j = CEL_ExprOp(f"{j} + 1").after(print_op).output
      with CEL_Condition(f"{incr_j} < {until_b}"):
        recur_b(incr_j, until_b)

    start_b = CEL_ExprOp("0").output
    with CEL_Condition(f"{start_b} < {until_b}"):
      b = recur_b(start_b, until_b)

    incr_i = CEL_ExprOp(f"{i} + 1").after(b).output
    with CEL_Condition(f"{incr_i} < {until_a}"):
      recur_a(incr_i, until_a)

  start_a = CEL_ExprOp("0").output
  with CEL_Condition(f"{start_a} < {until_a}"):
    recur_a(start_a, until_a)


if __name__ == '__main__':
  from kfp_tekton.compiler import TektonCompiler as Compiler
  # For Argo, uncomment the line below
  # from kfp.compiler import Compiler
  Compiler().compile(double_recursion_test, __file__.replace('.py', '.yaml'))

We need to raise errors for Tekton similar to the Argo CLI:

RuntimeError: Internal compiler error: Compiler has produced Argo-incompatible workflow.
Please create a new issue at https://github.com/kubeflow/pipelines/issues attaching the pipeline code and the pipeline package.
Error: time="2021-05-14T09:43:35.784Z" level=error msg="Error in file /dev/stdin: templates.double-recursion-test.tasks.condition-1 templates.condition-1.tasks.graph-recur-a-2 templates.graph-recur-a-2.tasks.condition-3 templates.condition-3.tasks.graph-recur-b-4 templates.graph-recur-b-4.tasks.condition-5 templates.condition-5.tasks.graph-recur-b-4 templates.graph-recur-b-4 inputs.parameters.condition-cel-status was not supplied"
time="2021-05-14T09:43:35.785Z" level=fatal msg="Errors encountered in validation"

Additional information: To reproduce this in Argo, users need to install kfp 1.5.0 from source. Pypi release won't have the latest Argo CLI.

Tomcli avatar May 17 '21 17:05 Tomcli

@Tomcli So miscompiling of this example is a problem in the original kubeflow-pipelines Argo compiler? Can it be fixed in kfp-tekton or does it require a fix in Argo version too?

Udiknedormin avatar May 18 '21 12:05 Udiknedormin

I will open another issue once I find a way to fix it in Tekton. This issue is for fixing the parameter validation.

Tomcli avatar May 18 '21 23:05 Tomcli

So looks like the Tekton compiler and Argo have difference issue. For Tekton I opened a new issue #599 for fixing the loop custom task location. The Argo error generated from the Argo CLI seems not relevant because the missing input parameter is actually in the same Argo template. It could be an Argo CLI bug in the 2.x version because Argo 3.x CLI has a different validation method that didn't work on KFP.

Tomcli avatar May 25 '21 00:05 Tomcli

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 02 '22 08:03 stale[bot]

/close

Tomcli avatar Aug 24 '22 16:08 Tomcli

@Tomcli: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Aug 24 '22 16:08 google-oss-prow[bot]