feat(v2): Create a runner CLI for V2
Description of your changes:
- Following cobra instruction to build
runnercmd tool: https://github.com/spf13/cobra/blob/master/cobra/README.md#initalizing-an-cobra-cli-application - Update
.cloudbuild.yamlto build runner image. - Replace everywhere using launcher and driver images to use runner image instead.
Checklist:
- [x] The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: To complete the pull request process, please ask for approval from zijianjoy after the PR has been reviewed.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@zijianjoy before we go further down the road, can you please help me understand what is this CLI used for? Is it only for testing or something necessary to ship v2 that I might have missed.
@chensun Sorry I owe you an explanation.
So currently we have launcher-v2 CMD tool and driver CMD tool under https://github.com/kubeflow/pipelines/tree/master/backend/src/v2/cmd. We are using them in the argo compiler as in https://github.com/kubeflow/pipelines/blob/58e673853e5a8d5501e4d8cf5374cb8c2e3fece4/backend/src/v2/compiler/argocompiler/argo.go#L111-L113.
However, we can totally merge these two CMD tools into one for simplicity. As you have seen that the license review process for each image requires some manual effort, it is better to have as minimum number of images as possible.
As a result, I am introducing a new CMD tool runner. And the way to use launcher and driver stays the same, except that you call the CMD tool with command runner drive ... instead of command driver ....
You can learn more from the PR description. The PR is now ready for review.
The title should probably be chore, because this does not affect user feature.
@zijianjoy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| kubeflow-pipelines-v2-go-test | ebae452ce5f03221309cb5e406721981da606f68 | link | true | /test kubeflow-pipelines-v2-go-test |
| kubeflow-pipeline-backend-test | ebae452ce5f03221309cb5e406721981da606f68 | link | true | /test kubeflow-pipeline-backend-test |
| kubeflow-pipeline-e2e-test | ebae452ce5f03221309cb5e406721981da606f68 | link | true | /test kubeflow-pipeline-e2e-test |
| kubeflow-pipeline-upgrade-test | ebae452ce5f03221309cb5e406721981da606f68 | link | true | /test kubeflow-pipeline-upgrade-test |
| kubeflow-pipelines-samples-v2 | ebae452ce5f03221309cb5e406721981da606f68 | link | true | /test kubeflow-pipelines-samples-v2 |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
@chensun Sorry I owe you an explanation.
So currently we have
launcher-v2CMD tool anddriverCMD tool under https://github.com/kubeflow/pipelines/tree/master/backend/src/v2/cmd. We are using them in the argo compiler as inhttps://github.com/kubeflow/pipelines/blob/58e673853e5a8d5501e4d8cf5374cb8c2e3fece4/backend/src/v2/compiler/argocompiler/argo.go#L111-L113
. However, we can totally merge these two CMD tools into one for simplicity. As you have seen that the license review process for each image requires some manual effort, it is better to have as minimum number of images as possible.
As a result, I am introducing a new CMD tool
runner. And the way to use launcher and driver stays the same, except that you call the CMD tool with commandrunner drive ...instead of commanddriver ....You can learn more from the PR description. The PR is now ready for review.
@zijianjoy Thank you for the detailed explanation. Combined with the latest iteration, they make much more sense now. Yes, we have discussed the idea of combining driver and launcher into a single image, and I'm fully onboard with that proposal.
I wasn't expecting the change at this moment though, as I think this refactor doesn't necessarily block v2 launch--please correct me if I'm wrong--It could happen at a much later stage, post-beta for example, assuming we have enough bandwidth left before v2 launch.
One of the reasons behind this preference is I feel like having the concept and naming of driver and launcher is a bit confusing already, adding a runner may possibly make it even worse. I wonder if we have considered reducing the concept/naming into one. For reference, in KFP SDK we had something similar, albeit not apple-to-apple: when user write a Python lightweight component, the SDK "injects" a binary called executor at runtime, and it does something similar to driver and launcher combined on a different level, including preparing inputs value/object, passing the inputs and executing user function, collecting artifact metadata and writing it somewhere the backend system can then read from.
So I'm not sure if it's necessary or worth having so many abstract concepts like runner, driver/drive, launcher/launch. Maybe we just need one with some concrete actions under it, for example, runner.get_inputs, runner.run, runner.upload_metadata, etc. just for illustration, while I didn't give it too much thoughts.
All in all, my point is such refactoring isn't necessarily a top priority task for now, and when it's time to do it, it might needs a bit more thinking and design than simply making two commands sub-commands under one. WDYT? Of course, if you're very close to get this done, I'm not going to block it. We can always iterate on it.
/cc @Bobgy if you'd like to share some thoughts as well.
I agree there are many special terms, but maybe it's not a huge problem, because they are only internal terms -- KFP users do not need to understand them. My impression is that the terms are invented to describe KFP architecture, as long as the arch is complex enough to have these separate stages (each one of them is uniquely different from others), there's a need to give each stage a name. Otherwise, we cannot effectively discuss the architecture and write the code. One thing good is that, we at least have consistent terms, every time we have a name, it's clearly defined and refers to just one thing in the architecture (both in design and code) http://bit.ly/kfp-v2.
I'm not looking at the whole plate, but my impression is that saving the cost of managing two images instead of one is a good enough savement to release process already, so it reduces the total cost for alpha release. The PR is 70% done. What's left is fixing the sample test to replace the old images with the new one and cleaning up code for the old flags.
Another thought, maybe it's better to name this new CLI kfpexec? The name comes from argoexec. If people inspect KFP v2 underlying Pods, they'll easily find argoexec and also kfpexec as two wrapper CLIs around the actual user command. Similar naming can probably make their similar purpose easier to understand.
To explain, the purpose of argoexec is exactly the same as runner, it's a CLI tool that includes code (splitted as different sub-commands) for all stages inside a workflow Pod during argo workflow execution.
Just for your information, feel free to discuss and choose whatever name that makes the most sense to you. I see no strong difference, runner may be good enough as well.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
@zijianjoy: The lifecycle/frozen label cannot be applied to Pull Requests.
In response to this:
/lifecycle frozen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.