Improve Training Operator release process
Related: https://github.com/kubeflow/katib/issues/2049
We need to improve our release process for Training Operator:
- Branch names should follow this pattern:
release-X.Y. Similar to Katib or Kubernetes. - Automate release with GitHub Actions.
/good-first-issue /help
@andreyvelich: This request has been marked as suitable for new contributors.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.
In response to this:
Related: https://github.com/kubeflow/katib/issues/2049
We need to improve our release process for Training Operator:
- Branch names should follow this pattern:
release-X.Y. Similar to Katib or Kubernetes.- Automate release with GitHub Actions.
/good-first-issue /help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
I want to take this. /assign
Additionally, I would like to use the semantic versioning image tags every for the release here: https://github.com/kubeflow/training-operator/blob/f8687ca7fd947e6ebd52dde4dfeefdf006e7b239/manifests/overlays/standalone/kustomization.yaml#L9
okay I'll look at it and raise a PR ASAP
No one is working on this one right? I can take a look /assign
No one is working on this one right? I can take a look /assign
Yes, feel free to take this.
Thank you for your time @Deathfireofdoom! I would also suggest to also check how we refactor and automate the Spark Operator release process with @ChenYi015 : https://github.com/kubeflow/spark-operator/pull/2089
I think, we can re-use some of the steps.
Hey @Deathfireofdoom if you're not working on this issue, then I would like to work on it.
@Veer0x1 Sorry for the delay, started tackling it but got hectic at work, chapter 11 stuff hahah, so will probably not have time to look into this more until after holiday anyway! So feel free to take it! :)
/assign
@andreyvelich Any suggestion on how to handle changelog generation?
@Veer0x1 Please let us know if you still want to work on this, given that we very close to make the first Kubeflow Trainer releases, it would be great to automate our process!
/unassign @Veer0x1
@milinddethe15 Could you help us to explore how others solve it ? Do we need to introduce PR title check to simplify Changelog generation:
feat(...)
fix(...)
chore(...)
I think, we can use the same action as @thesuperzapper used for Kubeflow Notebooks, but just update the types and scope
Do you want to work on this @milinddethe15 ?
Yes, I am happy to help with this. I will look how other KF projects are doing this and give a update here before a PR.
/assign
Hi @milinddethe15, did you get a chance to work on this issue ?
We are planning to release Kubeflow Trainer 2.0 soon, and it would be nice to have release automation for it: https://github.com/kubeflow/trainer/issues/2170
I am working on it.
@andreyvelich I have pushed the commits and created a PR (https://github.com/milinddethe15/kf-trainer/pull/1) in the my fork repo for testing. However the github actions using ubuntu-latest-16-cores aren't gettting started.
Is there any workaround to test the release process?
@milinddethe15 Do you want to try the default runner: ubuntu-latest to try out your release action ?
Also, FYI, we don't need to release SDK as part of Kubeflow Trainer release since it will be decoupled from kubeflow/trainer after this KEP: https://github.com/kubeflow/community/pull/823.
@milinddethe15 Do you want to try the default runner: ubuntu-latest to try out your release action ?
yeah, I will try that out.
Also, FYI, we don't need to release SDK as part of Kubeflow Trainer release since it will be decoupled from
kubeflow/trainerafter this KEP: kubeflow/community#823.
Will this delay the Trainer v2.0 release until the NEW SDK is available?
@milinddethe15 Do you want to try the default runner: ubuntu-latest to try out your release action ?
yeah, I will try that out.
I have used the ubuntu-latest runners but the e2e tests are failing due to: no space left on device
Will this delay the Trainer v2.0 release until the NEW SDK is available?
No, we don't need to delay Trainer v2.0.
For now, we just ask users to directly install SDK from the kubeflow/sdk repository.
I have used the ubuntu-latest runners but the e2e tests are failing due to: no space left on device
Can you try to test it without building the images ? Maybe you can just "fake" the image build to verify that the rest of the steps are working correct ?
Hi @milinddethe15, do you think we can target this enhancement before Kubeflow Trainer 2.0 release ? We are planning to cut release before May 5th
I have successfully setup the release actions. see at my forked release branch: https://github.com/milinddethe15/kf-trainer/tree/release-2.0
Now, automating the changelog generation, in draft release, is pending.
we can use: https://github.com/kubeflow/trainer/blob/master/docs/release/changelog.py. However, grouping PRs into Breaking Changes, New Features, Bug fixes, Misc, etc. will be a manual task.
So, can this step be skipped (I mean, CHANGELOG needs to be updated manually)?
That's great, yes, I think we can skip the Changelog generation for now.
For the Changelog, shall we apply the PR name validation to ask contributors to name PRs as follows: feat(...) chore(...) fix(...)
Similar to KFP and Kubeflow Notebooks ?
WDYT @kubeflow/wg-training-leads @Electronic-Waste @astefanutti ?
@milinddethe15 Also, why in your branch the images are not updated in the Kustomize manifests ?
E.g. we should keep this image tag: v2.0.0
https://github.com/milinddethe15/kf-trainer/blob/release-2.0/manifests/overlays/manager/kustomization.yaml#L17
For the Changelog, shall we apply the PR name validation to ask contributors to name PRs as follows: feat(...) chore(...) fix(...)
Similar to KFP and Kubeflow Notebooks ?
SGTM
@milinddethe15 Also, why in your branch the images are not updated in the Kustomize manifests ? E.g. we should keep this image tag:
v2.0.0https://github.com/milinddethe15/kf-trainer/blob/release-2.0/manifests/overlays/manager/kustomization.yaml#L17
I am just testing the release actions here. Although we should check whether the image tags matches with the VERSION in the Check Release action.
@milinddethe15 Should we update the image tag as part of release action ? For example, we do that in the Katib release script: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L71
Yes, we can do that.
@milinddethe15 @Veer0x1 Do you want to finalize your PR to automate Kubeflow Trainer release process or we can find new contributor for it ?
- https://github.com/kubeflow/trainer/pull/2359
- https://github.com/kubeflow/trainer/pull/2623
/area engprod