pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

feat(v2): Create a runner CLI for V2

Open zijianjoy opened this issue 3 years ago • 11 comments

Description of your changes:

  1. Following cobra instruction to build runner cmd tool: https://github.com/spf13/cobra/blob/master/cobra/README.md#initalizing-an-cobra-cli-application
  2. Update .cloudbuild.yaml to build runner image.
  3. Replace everywhere using launcher and driver images to use runner image instead.

Checklist:

zijianjoy avatar Feb 14 '22 16:02 zijianjoy

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please ask for approval from zijianjoy after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Feb 14 '22 16:02 google-oss-prow[bot]

@zijianjoy before we go further down the road, can you please help me understand what is this CLI used for? Is it only for testing or something necessary to ship v2 that I might have missed.

chensun avatar Feb 15 '22 09:02 chensun

@chensun Sorry I owe you an explanation.

So currently we have launcher-v2 CMD tool and driver CMD tool under https://github.com/kubeflow/pipelines/tree/master/backend/src/v2/cmd. We are using them in the argo compiler as in https://github.com/kubeflow/pipelines/blob/58e673853e5a8d5501e4d8cf5374cb8c2e3fece4/backend/src/v2/compiler/argocompiler/argo.go#L111-L113.

However, we can totally merge these two CMD tools into one for simplicity. As you have seen that the license review process for each image requires some manual effort, it is better to have as minimum number of images as possible.

As a result, I am introducing a new CMD tool runner. And the way to use launcher and driver stays the same, except that you call the CMD tool with command runner drive ... instead of command driver ....

You can learn more from the PR description. The PR is now ready for review.

zijianjoy avatar Feb 16 '22 06:02 zijianjoy

The title should probably be chore, because this does not affect user feature.

Bobgy avatar Feb 16 '22 07:02 Bobgy

@zijianjoy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-v2-go-test ebae452ce5f03221309cb5e406721981da606f68 link true /test kubeflow-pipelines-v2-go-test
kubeflow-pipeline-backend-test ebae452ce5f03221309cb5e406721981da606f68 link true /test kubeflow-pipeline-backend-test
kubeflow-pipeline-e2e-test ebae452ce5f03221309cb5e406721981da606f68 link true /test kubeflow-pipeline-e2e-test
kubeflow-pipeline-upgrade-test ebae452ce5f03221309cb5e406721981da606f68 link true /test kubeflow-pipeline-upgrade-test
kubeflow-pipelines-samples-v2 ebae452ce5f03221309cb5e406721981da606f68 link true /test kubeflow-pipelines-samples-v2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

google-oss-prow[bot] avatar Feb 16 '22 07:02 google-oss-prow[bot]

@chensun Sorry I owe you an explanation.

So currently we have launcher-v2 CMD tool and driver CMD tool under https://github.com/kubeflow/pipelines/tree/master/backend/src/v2/cmd. We are using them in the argo compiler as in

https://github.com/kubeflow/pipelines/blob/58e673853e5a8d5501e4d8cf5374cb8c2e3fece4/backend/src/v2/compiler/argocompiler/argo.go#L111-L113

. However, we can totally merge these two CMD tools into one for simplicity. As you have seen that the license review process for each image requires some manual effort, it is better to have as minimum number of images as possible.

As a result, I am introducing a new CMD tool runner. And the way to use launcher and driver stays the same, except that you call the CMD tool with command runner drive ... instead of command driver ....

You can learn more from the PR description. The PR is now ready for review.

@zijianjoy Thank you for the detailed explanation. Combined with the latest iteration, they make much more sense now. Yes, we have discussed the idea of combining driver and launcher into a single image, and I'm fully onboard with that proposal.

I wasn't expecting the change at this moment though, as I think this refactor doesn't necessarily block v2 launch--please correct me if I'm wrong--It could happen at a much later stage, post-beta for example, assuming we have enough bandwidth left before v2 launch.

One of the reasons behind this preference is I feel like having the concept and naming of driver and launcher is a bit confusing already, adding a runner may possibly make it even worse. I wonder if we have considered reducing the concept/naming into one. For reference, in KFP SDK we had something similar, albeit not apple-to-apple: when user write a Python lightweight component, the SDK "injects" a binary called executor at runtime, and it does something similar to driver and launcher combined on a different level, including preparing inputs value/object, passing the inputs and executing user function, collecting artifact metadata and writing it somewhere the backend system can then read from. So I'm not sure if it's necessary or worth having so many abstract concepts like runner, driver/drive, launcher/launch. Maybe we just need one with some concrete actions under it, for example, runner.get_inputs, runner.run, runner.upload_metadata, etc. just for illustration, while I didn't give it too much thoughts.

All in all, my point is such refactoring isn't necessarily a top priority task for now, and when it's time to do it, it might needs a bit more thinking and design than simply making two commands sub-commands under one. WDYT? Of course, if you're very close to get this done, I'm not going to block it. We can always iterate on it.

/cc @Bobgy if you'd like to share some thoughts as well.

chensun avatar Feb 16 '22 09:02 chensun

I agree there are many special terms, but maybe it's not a huge problem, because they are only internal terms -- KFP users do not need to understand them. My impression is that the terms are invented to describe KFP architecture, as long as the arch is complex enough to have these separate stages (each one of them is uniquely different from others), there's a need to give each stage a name. Otherwise, we cannot effectively discuss the architecture and write the code. One thing good is that, we at least have consistent terms, every time we have a name, it's clearly defined and refers to just one thing in the architecture (both in design and code) http://bit.ly/kfp-v2.

I'm not looking at the whole plate, but my impression is that saving the cost of managing two images instead of one is a good enough savement to release process already, so it reduces the total cost for alpha release. The PR is 70% done. What's left is fixing the sample test to replace the old images with the new one and cleaning up code for the old flags.

Bobgy avatar Feb 16 '22 10:02 Bobgy

Another thought, maybe it's better to name this new CLI kfpexec? The name comes from argoexec. If people inspect KFP v2 underlying Pods, they'll easily find argoexec and also kfpexec as two wrapper CLIs around the actual user command. Similar naming can probably make their similar purpose easier to understand.

To explain, the purpose of argoexec is exactly the same as runner, it's a CLI tool that includes code (splitted as different sub-commands) for all stages inside a workflow Pod during argo workflow execution.

Just for your information, feel free to discuss and choose whatever name that makes the most sense to you. I see no strong difference, runner may be good enough as well.

Bobgy avatar Feb 16 '22 11:02 Bobgy

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 13 '22 04:06 stale[bot]

/lifecycle frozen

zijianjoy avatar Jun 23 '22 06:06 zijianjoy

@zijianjoy: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jun 23 '22 06:06 google-oss-prow[bot]