training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Add migration guide from Training Operator to Kubeflow Trainer V2

Open andreyvelich opened this issue 10 months ago • 13 comments

Ref: https://github.com/kubeflow/training-operator/issues/2170

We should add user guide on how to migrate from Kubeflow Training Operator to Kubeflow Trainer V2.

cc @varodrig

/area docs /good-first-issue

andreyvelich avatar Feb 04 '25 14:02 andreyvelich

@andreyvelich: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

Ref: https://github.com/kubeflow/training-operator/issues/2170

We should add user guide on how to migrate from Kubeflow Training Operator to Kubeflow Trainer V2.

cc @varodrig

/area docs /good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Feb 04 '25 14:02 google-oss-prow[bot]

/assign

akagami-harsh avatar Feb 04 '25 15:02 akagami-harsh

Hi @akagami-harsh, do you still work on this issue or we can search for new contributors ?

andreyvelich avatar Mar 04 '25 17:03 andreyvelich

Apologies for the delay! I got caught up with other tasks, but I’m back on it now and will try to submit a PR as soon as possible.

akagami-harsh avatar Mar 04 '25 22:03 akagami-harsh

No worries, if you want we can delegate this task to other contributors. We also need to create a lot of docs for Kubeflow Trainer cluster operators, feel free to create separate issues if you want to propose something!

andreyvelich avatar Mar 04 '25 22:03 andreyvelich

/unassign

akagami-harsh avatar Mar 10 '25 09:03 akagami-harsh

/assign

Riddhikshah21 avatar Mar 15 '25 05:03 Riddhikshah21

/unassign

Riddhikshah21 avatar Mar 16 '25 03:03 Riddhikshah21

I would like to take this up, I'm a new contributor so would love any advice about the recommended solution and potential steps. @andreyvelich

nb923 avatar Mar 16 '25 15:03 nb923

/assign

akiseakusa avatar Mar 24 '25 10:03 akiseakusa

@andreyvelich Since PR #3958 is already approved, it seems we can close this issue. If there are any pending tasks or areas where I can contribute, please let me know where I might be able to jump in.

akiseakusa avatar Mar 24 '25 11:03 akiseakusa

@akiseakusa We still require to write a guide for user migration from Training Operator v1 to Kubeflow Trainer V2: https://www.kubeflow.org/docs/components/trainer/operator-guides/migration/

andreyvelich avatar Mar 24 '25 16:03 andreyvelich

@andreyvelich I’m referring to PR #2402 for kubeflow Trainer V2 migration guide. Are there any resources I should refer to for this?

akiseakusa avatar Mar 25 '25 17:03 akiseakusa

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 23 '25 20:06 github-actions[bot]

/remove-lifecycle stale

kramaranya avatar Jun 24 '25 20:06 kramaranya

Hi @akiseakusa, did you have a chance to work on this guide? We'd love to have this completed since we're planning to release trainer v2 soon. Some useful resources could be https://github.com/kubeflow/trainer/tree/master/docs/proposals/2170-kubeflow-trainer-v2, https://docs.google.com/document/d/1X1x2UwAYpKdQggbuSiF2l3i5Z3cQQztWLS-4ay7QD-Q/edit?usp=sharing. Please let us know if you're willing to continue working on this or if we should look for other contributors to help out.

kramaranya avatar Jul 09 '25 22:07 kramaranya

Hi @kramaranya, thank you for following up! I’ve created a PR to add the migration guide as discussed: kubeflow/website#4142.

It includes the PyTorchJob → TrainingRuntime migration example, a version compatibility table, and aligns with the Trainer v2 design proposal. Please let me know if there are any suggestions or changes needed.

akiseakusa avatar Jul 11 '25 12:07 akiseakusa

This is complete: https://github.com/kubeflow/website/pull/4142

andreyvelich avatar Jul 17 '25 20:07 andreyvelich