community icon indicating copy to clipboard operation
community copied to clipboard

Proposal: Donate the MPI-Operator.V2 to kubernetes-sigs

Open ArangoGutierrez opened this issue 2 years ago • 9 comments

Signed-off-by: Carlos Eduardo Arango Gutierrez [email protected]

ArangoGutierrez avatar Mar 11 '22 22:03 ArangoGutierrez

/cc @alculquicondor

ArangoGutierrez avatar Mar 11 '22 22:03 ArangoGutierrez

https://github.com/kubeflow/mpi-operator/issues/459

ArangoGutierrez avatar Mar 11 '22 22:03 ArangoGutierrez

@kubeflow/project-steering-group

terrytangyuan avatar Mar 13 '22 03:03 terrytangyuan

/assign @bobgy

ArangoGutierrez avatar Mar 14 '22 14:03 ArangoGutierrez

/assign @theadactyl

alculquicondor avatar Mar 14 '22 15:03 alculquicondor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ArangoGutierrez To complete the pull request process, please ask for approval from bobgy after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Mar 14 '22 15:03 google-oss-prow[bot]

Thanks for opening this! I don't see anywhere an assessment of how this impacts existing users of MPI operator and/or the unified operator, and what the implications are for unified operator going forward. Also how this might impact other parts of Kubeflow. I think that's important to include -- can you do so?

Additionally, what in particular is the value of donation here, vs just maintaining MPI operator within Kubeflow with fewer dependencies that would block HPC users? Donation isn't a lightweight process or decision, so interested to see what the specific advantages/disadvantages are there so we can call out and validate any assumptions being made here.

Also, when there are a couple more updates, I'd like to get feedback from the community. This would benefit from being sent to kubeflow-discuss & having a spot for discussion in an upcoming Community Meeting.

How does that sound?

theadactyl avatar Mar 14 '22 17:03 theadactyl

I would like to see a more detailed proposal for the migration plan. Specifically:

  • How do we avoid having two divergent versions of the MpiJob?
  • Assuming that the new MpiJob replaces the current version, how do we handle installation of the new MPI operator?
  • How do we handle common dependencies?
  • Will the new MpiJob API be backward compatible?
  • What will be the release and versioning plan going forward?

richardsliu avatar Mar 17 '22 20:03 richardsliu

HI @ArangoGutierrez Any new process?

denkensk avatar Aug 05 '22 03:08 denkensk

Given https://github.com/cncf/toc/pull/950 let's /hold

ArangoGutierrez avatar Feb 02 '23 11:02 ArangoGutierrez

I think now I have time to get back to this thread PING @theadactyl @alculquicondor @richardsliu What would it take to revive this thread

ArangoGutierrez avatar Feb 09 '23 13:02 ArangoGutierrez

While this is still open, it would be a lot of effort from a copyright perspective https://github.com/cncf/toc/pull/950

So I think we should wait.

alculquicondor avatar Feb 09 '23 14:02 alculquicondor

I'd recommend waiting on CNCF donation review unless there is something being significantly blocked here.

On Thu, Feb 9, 2023 at 6:06 AM Aldo Culquicondor @.***> wrote:

While this is still open, it would be a lot of effort from a copyright perspective cncf/toc#950 https://github.com/cncf/toc/pull/950

So I think we should wait.

— Reply to this email directly, view it on GitHub https://github.com/kubeflow/community/pull/557#issuecomment-1424243828, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABREJVK2WCFRXO3OU3ZK633WWT2VZANCNFSM5QQ4NJCA . You are receiving this because you were mentioned.Message ID: @.***>

theadactyl avatar Feb 09 '23 18:02 theadactyl

cc

tenzen-y avatar Feb 10 '23 20:02 tenzen-y

https://github.com/cncf/toc/pull/1042

ArangoGutierrez avatar Jul 25 '23 17:07 ArangoGutierrez

I guess we can restart this effort?

ArangoGutierrez avatar Jul 25 '23 17:07 ArangoGutierrez

I think we can rather drop it.

Now that kubeflow is part of CNCF, it's easier for organizations to contribute to the project.

@ArangoGutierrez do you have an additional motivation to make this a kubernetes subproject?

alculquicondor avatar Jul 26 '23 14:07 alculquicondor

I think we can rather drop it.

Now that kubeflow is part of CNCF, it's easier for organizations to contribute to the project.

@ArangoGutierrez do you have an additional motivation to make this a kubernetes subproject?

Agree /Close

ArangoGutierrez avatar Jul 26 '23 14:07 ArangoGutierrez

@ArangoGutierrez: Closed this PR.

In response to this:

I think we can rather drop it.

Now that kubeflow is part of CNCF, it's easier for organizations to contribute to the project.

@ArangoGutierrez do you have an additional motivation to make this a kubernetes subproject?

Agree /Close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jul 26 '23 14:07 google-oss-prow[bot]

If this is the case, can we please make some effort to update the documentation on the Kubeflow website to reflect that there are now two separate components:

  1. The unified training operator: https://github.com/kubeflow/training-operator
  2. The MPI Operator: https://github.com/kubeflow/mpi-operator

We need to update and split the following component page: https://www.kubeflow.org/docs/components/training/

thesuperzapper avatar Jul 26 '23 22:07 thesuperzapper