training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Improve Training Operator release process

Open andreyvelich opened this issue 1 year ago • 13 comments

Related: https://github.com/kubeflow/katib/issues/2049

We need to improve our release process for Training Operator:

  • Branch names should follow this pattern: release-X.Y. Similar to Katib or Kubernetes.
  • Automate release with GitHub Actions.

/good-first-issue /help

andreyvelich avatar Jun 25 '24 19:06 andreyvelich

@andreyvelich: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

Related: https://github.com/kubeflow/katib/issues/2049

We need to improve our release process for Training Operator:

  • Branch names should follow this pattern: release-X.Y. Similar to Katib or Kubernetes.
  • Automate release with GitHub Actions.

/good-first-issue /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

google-oss-prow[bot] avatar Jun 25 '24 19:06 google-oss-prow[bot]

I want to take this. /assign

7h3-3mp7y-m4n avatar Jun 26 '24 07:06 7h3-3mp7y-m4n

Additionally, I would like to use the semantic versioning image tags every for the release here: https://github.com/kubeflow/training-operator/blob/f8687ca7fd947e6ebd52dde4dfeefdf006e7b239/manifests/overlays/standalone/kustomization.yaml#L9

tenzen-y avatar Jul 25 '24 07:07 tenzen-y

okay I'll look at it and raise a PR ASAP

7h3-3mp7y-m4n avatar Jul 29 '24 11:07 7h3-3mp7y-m4n

No one is working on this one right? I can take a look /assign

Deathfireofdoom avatar Oct 31 '24 12:10 Deathfireofdoom

No one is working on this one right? I can take a look /assign

Yes, feel free to take this.

tenzen-y avatar Nov 01 '24 17:11 tenzen-y

Thank you for your time @Deathfireofdoom! I would also suggest to also check how we refactor and automate the Spark Operator release process with @ChenYi015 : https://github.com/kubeflow/spark-operator/pull/2089

I think, we can re-use some of the steps.

andreyvelich avatar Nov 01 '24 17:11 andreyvelich

Hey @Deathfireofdoom if you're not working on this issue, then I would like to work on it.

Veer0x1 avatar Dec 14 '24 10:12 Veer0x1

@Veer0x1 Sorry for the delay, started tackling it but got hectic at work, chapter 11 stuff hahah, so will probably not have time to look into this more until after holiday anyway! So feel free to take it! :)

Deathfireofdoom avatar Dec 18 '24 17:12 Deathfireofdoom

/assign

Veer0x1 avatar Dec 21 '24 05:12 Veer0x1

@andreyvelich Any suggestion on how to handle changelog generation?

milinddethe15 avatar Mar 15 '25 20:03 milinddethe15

@Veer0x1 Please let us know if you still want to work on this, given that we very close to make the first Kubeflow Trainer releases, it would be great to automate our process!

/unassign @Veer0x1

@milinddethe15 Could you help us to explore how others solve it ? Do we need to introduce PR title check to simplify Changelog generation:

feat(...)
fix(...)
chore(...)

I think, we can use the same action as @thesuperzapper used for Kubeflow Notebooks, but just update the types and scope

Do you want to work on this @milinddethe15 ?

andreyvelich avatar Mar 18 '25 18:03 andreyvelich

Yes, I am happy to help with this. I will look how other KF projects are doing this and give a update here before a PR.

/assign

milinddethe15 avatar Mar 18 '25 20:03 milinddethe15

Hi @milinddethe15, did you get a chance to work on this issue ?

We are planning to release Kubeflow Trainer 2.0 soon, and it would be nice to have release automation for it: https://github.com/kubeflow/trainer/issues/2170

andreyvelich avatar Apr 11 '25 23:04 andreyvelich

I am working on it.

milinddethe15 avatar Apr 14 '25 10:04 milinddethe15

@andreyvelich I have pushed the commits and created a PR (https://github.com/milinddethe15/kf-trainer/pull/1) in the my fork repo for testing. However the github actions using ubuntu-latest-16-cores aren't gettting started. Is there any workaround to test the release process?

milinddethe15 avatar Apr 14 '25 15:04 milinddethe15

@milinddethe15 Do you want to try the default runner: ubuntu-latest to try out your release action ? Also, FYI, we don't need to release SDK as part of Kubeflow Trainer release since it will be decoupled from kubeflow/trainer after this KEP: https://github.com/kubeflow/community/pull/823.

andreyvelich avatar Apr 14 '25 22:04 andreyvelich

@milinddethe15 Do you want to try the default runner: ubuntu-latest to try out your release action ?

yeah, I will try that out.

Also, FYI, we don't need to release SDK as part of Kubeflow Trainer release since it will be decoupled from kubeflow/trainer after this KEP: kubeflow/community#823.

Will this delay the Trainer v2.0 release until the NEW SDK is available?

milinddethe15 avatar Apr 15 '25 05:04 milinddethe15

@milinddethe15 Do you want to try the default runner: ubuntu-latest to try out your release action ?

yeah, I will try that out.

I have used the ubuntu-latest runners but the e2e tests are failing due to: no space left on device

milinddethe15 avatar Apr 15 '25 10:04 milinddethe15

Will this delay the Trainer v2.0 release until the NEW SDK is available?

No, we don't need to delay Trainer v2.0. For now, we just ask users to directly install SDK from the kubeflow/sdk repository.

I have used the ubuntu-latest runners but the e2e tests are failing due to: no space left on device

Can you try to test it without building the images ? Maybe you can just "fake" the image build to verify that the rest of the steps are working correct ?

andreyvelich avatar Apr 15 '25 11:04 andreyvelich

Hi @milinddethe15, do you think we can target this enhancement before Kubeflow Trainer 2.0 release ? We are planning to cut release before May 5th

andreyvelich avatar Apr 23 '25 02:04 andreyvelich

I have successfully setup the release actions. see at my forked release branch: https://github.com/milinddethe15/kf-trainer/tree/release-2.0 Now, automating the changelog generation, in draft release, is pending. we can use: https://github.com/kubeflow/trainer/blob/master/docs/release/changelog.py. However, grouping PRs into Breaking Changes, New Features, Bug fixes, Misc, etc. will be a manual task. So, can this step be skipped (I mean, CHANGELOG needs to be updated manually)?

milinddethe15 avatar Apr 23 '25 11:04 milinddethe15

That's great, yes, I think we can skip the Changelog generation for now.

For the Changelog, shall we apply the PR name validation to ask contributors to name PRs as follows: feat(...) chore(...) fix(...)

Similar to KFP and Kubeflow Notebooks ?

WDYT @kubeflow/wg-training-leads @Electronic-Waste @astefanutti ?

andreyvelich avatar Apr 23 '25 16:04 andreyvelich

@milinddethe15 Also, why in your branch the images are not updated in the Kustomize manifests ? E.g. we should keep this image tag: v2.0.0 https://github.com/milinddethe15/kf-trainer/blob/release-2.0/manifests/overlays/manager/kustomization.yaml#L17

andreyvelich avatar Apr 23 '25 16:04 andreyvelich

For the Changelog, shall we apply the PR name validation to ask contributors to name PRs as follows: feat(...) chore(...) fix(...)

Similar to KFP and Kubeflow Notebooks ?

SGTM

Electronic-Waste avatar Apr 23 '25 17:04 Electronic-Waste

@milinddethe15 Also, why in your branch the images are not updated in the Kustomize manifests ? E.g. we should keep this image tag: v2.0.0 https://github.com/milinddethe15/kf-trainer/blob/release-2.0/manifests/overlays/manager/kustomization.yaml#L17

I am just testing the release actions here. Although we should check whether the image tags matches with the VERSION in the Check Release action.

milinddethe15 avatar Apr 24 '25 13:04 milinddethe15

@milinddethe15 Should we update the image tag as part of release action ? For example, we do that in the Katib release script: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L71

andreyvelich avatar Apr 24 '25 14:04 andreyvelich

Yes, we can do that.

milinddethe15 avatar Apr 24 '25 18:04 milinddethe15

@milinddethe15 @Veer0x1 Do you want to finalize your PR to automate Kubeflow Trainer release process or we can find new contributor for it ?

  • https://github.com/kubeflow/trainer/pull/2359
  • https://github.com/kubeflow/trainer/pull/2623

andreyvelich avatar May 28 '25 19:05 andreyvelich

/area engprod

andreyvelich avatar May 28 '25 22:05 andreyvelich