pipelines
pipelines copied to clipboard
feat(Backend + SDK): Update kfp backend and kubernetes sdk to support EmptyDir
Description of your changes: Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods.
(Based on #10892 and needs rebase after that is merged first)
Inspired by #10427
Fixes: #10656
Checklist:
- [x] The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.
Hi @gregsheremeta. Thanks for your PR.
I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/ok-to-test
Thanks for tackling this! It's an important gap between v1 and v2.
@chensun can I ask for your review, or can you recommend someone to review? thanks.
@gregsheremeta I can help with the backend review, but for SDK we have only @connor-mccarthy. I can review both and see how can we proceed with approvals.
thanks. For what it's worth, the SDK piece is fairly simple, and Chen gave a preliminary lgtm over in #10892. It follows the (already approved and merged) example set in #10410. The key thing to review there is that I included the important fields. When you are doing an emptydir mount in kube, there are four pieces of information required (ref: https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume/#local-temporary-directory) : name, mount path, medium (optional), and size limit (optional)
actually, you may have been referring to the code in empty_dir.py :) In which case, kind of the same answer. It has the same pattern as (say) secrets but that I have these 4 fields instead.
The failing tests are expected. When #10892 is merged, I'll rebase this, and then the tests will work
@chensun @james-jwu @zijianjoy @connor-mccarthy can I ask for your review, or can you recommend someone to review? thanks.
/rerun-all
/lgtm
@chensun @connor-mccarthy I am not sure why these tests are failing, but I would appreciate your review of this feature, as not being able to mount emptyDir to tasks is preventing a number of people updating to KFP v2.
If you are wondering why this is important, it's because PyTorch expects there to be a tempfs mounted at /dev/shm, and the only way to ensure this in a Kubernetes Pod is to mount an emptyDir with medium="Memory" to it.
@gregsheremeta FYI all the GitHub Actions tests are mandatory. Some of the Prow ones may be optional. You can check which of them are optional here. And some of them should have been already removed. There are several PRs to remove the ones that we migrated to GHA. They are waiting for review.
@gregsheremeta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| kubeflow-pipeline-e2e-test | a8374c2758f9e772cf4b7fefab52a5cdf35351d3 | link | false | /test kubeflow-pipeline-e2e-test |
| kfp-kubernetes-execution-tests | a8374c2758f9e772cf4b7fefab52a5cdf35351d3 | link | false | /test kfp-kubernetes-execution-tests |
| kubeflow-pipeline-upgrade-test | 621469b3937d12b681de7822e5f2571cb841b0de | link | false | /test kubeflow-pipeline-upgrade-test |
| kubeflow-pipelines-samples-v2 | 621469b3937d12b681de7822e5f2571cb841b0de | link | false | /test kubeflow-pipelines-samples-v2 |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
Added some nitpick error handling and logging suggestions for the driver code. The Kubernetes Platform / SDK code and the corresponding tests LGTM.
/lgtm
/lgtm /approve
Thanks!
@chensun this is ready for your final review / merge. Can you take a look?
@chensun bumping :)
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: chensun, HumairAK
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [chensun]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Congrats on the merge, @gregsheremeta. Thank you for tackling this meaningful contribution! 🔥
this should now be available in KFP 2.3.0 / SDK 2.9.0