website icon indicating copy to clipboard operation
website copied to clipboard

Training: Add documentation for the MultiKueue and spec.managedBy API

Open Garvit-77 opened this issue 11 months ago • 8 comments

This Documentation resolves : kubeflow/training-operator/issues/2279

Garvit-77 avatar Jan 13 '25 14:01 Garvit-77

Hi @Garvit-77. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Jan 13 '25 14:01 google-oss-prow[bot]

@andreyvelich please review the PR and let me know if any changes are expected

Garvit-77 avatar Jan 15 '25 13:01 Garvit-77

Please help us with the review ack

mimowo avatar Jan 23 '25 10:01 mimowo

FYI, I think the feature will be almost completely a technical detail hidden from the users, as we are going to default the field in MultiKueue by a webhook once https://github.com/kubernetes-sigs/kueue/issues/2552 is done. So, I think describing the field is ok, but eventually a "plain" TFJob is what the user yaml contains. So, I think we will be able to reference to the MultiKueue docs for example.

mimowo avatar Jan 23 '25 11:01 mimowo

FYI we already have a note about this in Kueue in the MD file: https://github.com/kubernetes-sigs/kueue/blob/main/site/content/en/docs/tasks/run/multikueue/kubeflow.md. The actual user-facing documentation in https://kueue.sigs.k8s.io/docs/ link will be available when we release 0.11, which is planned for March 17th.

mimowo avatar Feb 19 '25 08:02 mimowo

he location of the field is changed between training-operator 1.9.0 and the new trainer. Trainer is still not supported though. Is this documentation page meant for 1.9.0 or the new trainer, or both? If both, should we add a note that the snippet presents the yaml only for training-operator 1.9.0?

These docs should be placed under legacy guide for Kubeflow Training Operator 1.9 https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/

@Garvit-77 Please can you rebase this PR, so you have the correct location for the guide.

andreyvelich avatar Feb 19 '25 11:02 andreyvelich

Hey @andreyvelich just rebased this on github itself and made changes had some issues with gpg while rebase I hope it still marks the requirement

Garvit-77 avatar Feb 22 '25 23:02 Garvit-77

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Apr 10 '25 17:04 google-oss-prow[bot]