training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

!chore: Remove support for MXJob

Open terrytangyuan opened this issue 1 year ago • 9 comments

Unfortunately, Apache MXNet has been archived. This PR removes MXJob support.

terrytangyuan avatar Nov 28 '23 18:11 terrytangyuan

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Nov 28 '23 18:11 google-oss-prow[bot]

Pull Request Test Coverage Report for Build 7023065747

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • 6 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.9%) to 41.988%

Files with Coverage Reduction New Missed Lines %
pkg/controller.v1/mpi/mpijob_controller.go 6 80.67%
<!-- Total: 6
Totals Coverage Status
Change from base Build 7007158786: -0.9%
Covered Lines: 3346
Relevant Lines: 7969

💛 - Coveralls

github-actions[bot] avatar Nov 28 '23 18:11 github-actions[bot]

Pull Request Test Coverage Report for Build 7023661984

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.9%) to 41.975%

Files with Coverage Reduction New Missed Lines %
pkg/controller.v1/mpi/mpijob.go 1 91.06%
<!-- Total: 1
Totals Coverage Status
Change from base Build 7007158786: -0.9%
Covered Lines: 3345
Relevant Lines: 7969

💛 - Coveralls

coveralls avatar Nov 28 '23 19:11 coveralls

This would be a breaking change but I think it's relatively safe given how few people are using MXNet these days. cc @kubeflow/wg-training-leads

terrytangyuan avatar Nov 29 '23 20:11 terrytangyuan

Thank you for creating this @terrytangyuan. That is sad to hear that Apache MXNet has been archived, it has a lot of potential.

I think, we still use MXNet for various things. E.g. this Trial image is the most popular in Katib: https://github.com/kubeflow/katib/tree/master/examples/v1beta1/trial-images/mxnet-mnist

Let's discuss it in one of our upcoming WG meetings on how to inform Kubeflow users about it. /hold

andreyvelich avatar Nov 29 '23 20:11 andreyvelich

Any concerns of removing this? I don't think we want to continue supporting it.

terrytangyuan avatar Jan 25 '24 00:01 terrytangyuan

Related: https://github.com/kubeflow/training-operator/issues/1996. We should merge this PR once we make Training Operator 1.9 release.

andreyvelich avatar Jan 25 '24 20:01 andreyvelich

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 25 '24 00:04 github-actions[bot]

/remove-lifecycle stale

tenzen-y avatar Apr 25 '24 02:04 tenzen-y

Does anyone want to pick this up?

terrytangyuan avatar May 16 '24 18:05 terrytangyuan

I don't think I'll have time to complete this. If you are interested in contributing, feel free to start a new PR.

terrytangyuan avatar Jun 23 '24 23:06 terrytangyuan

I can pick up this one.

tariq-hasan avatar Jun 24 '24 21:06 tariq-hasan

Thank you @tariq-hasan! Please feel free to submit another PR to remove Apache MXNet support, and we can merge it after Kubeflow Training 1.8 release.

andreyvelich avatar Jun 24 '24 23:06 andreyvelich

Sound good.

tariq-hasan avatar Jun 25 '24 00:06 tariq-hasan

Great. Thanks!

terrytangyuan avatar Jun 25 '24 01:06 terrytangyuan