katib icon indicating copy to clipboard operation
katib copied to clipboard

Automate Katib Releases

Open andreyvelich opened this issue 2 years ago • 22 comments

Currently, to make Katib releases we have to follow this manual process: https://github.com/kubeflow/katib/tree/master/docs/release

We run make release command, build and publish the release Docker images locally, and publish Katib SDK version. Since we build docker images locally, our release images don't support multi OS arch: https://hub.docker.com/layers/kubeflowkatib/katib-controller/v0.14.0/images/sha256-51ca80d6005010ff08853a5f7231158cb695ea899b623200076cbc01509fc0b5?context=repo.

The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

cc @tenzen-y @johnugeorge


Love this feature? Give it a 👍 We prioritize the features with the most 👍

andreyvelich avatar Dec 02 '22 13:12 andreyvelich

We can use a workflow_dispatch or release trigger in GHA

johnugeorge avatar Dec 02 '22 13:12 johnugeorge

@andreyvelich Thanks for proposing this.

Since we build docker images locally, our release images don't support multi OS arch

That's right. For now, we can not release multi-platform images by that documentation's steps.

The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

I agree with you.

We can use a workflow_dispatch or release trigger in GHA

I prefer to use the release trigger.

tenzen-y avatar Dec 02 '22 19:12 tenzen-y

That's right. For now, we can not release multi-platform images by that documentation's steps. The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

If we could prepare arm-machine self-hosted runner(or use github action arm runner with extra charge), we could make the automate release. How could we prepare the arm machine ?

anencore94 avatar Dec 17 '22 14:12 anencore94

That's right. For now, we can not release multi-platform images by that documentation's steps. The release process should be automated. For example, we can utilise GitHub Actions to make Katib releases.

If we could prepare arm-machine self-hosted runner(or use github action arm runner with extra charge), we could make the automate release. How could we prepare the arm machine ?

@anencore94 I mean we need to modify the make release command since we can not build multiplatform images using that command. Or Does that mean we should prepare arm-machine runners to run tests for arm env?

tenzen-y avatar Dec 17 '22 15:12 tenzen-y

@tenzen-y I mean if we prepare arm-machine runners, we could build arm-platform images at github-action workflows much easier and then publish them by manifests including both amd and arm image. WDYT ?

I'm not sure we need to enable make release to build multiplatform images at local. But I think it would be better to publish multiplatform image at release

anencore94 avatar Dec 19 '22 11:12 anencore94

@tenzen-y I mean if we prepare arm-machine runners, we could build arm-platform images at github-action workflows much easier and then publish them by manifests including both amd and arm image. WDYT ?

I'm not sure we need to enable make release to build multiplatform images at local. But I think it would be better to publish multiplatform image at release

@anencore94 I see. We can build multiplatform images using the default amd64 runner. Actually, we publish multi-platform images for every commit like this.

Probably, we don't need arm64 runners for the multi-platform build.

Does that sound good to you?

tenzen-y avatar Dec 19 '22 11:12 tenzen-y

We can build multiplatform images using the default amd64 runner. Actually, we publish multi-platform images for every commit like this. Probably, we don't need arm64 runners for the multi-platform build.

Sure, But building an arm-image in amd64-runner would be much slower since it uses some kind of virtualizer like QEMU to build arm-image. So if we could prepare arm64 runner, then it would be better. However, if it is not affordable, then yes I agree with to build it with amd64-runner. @tenzen-y

anencore94 avatar Dec 19 '22 11:12 anencore94

We can build multiplatform images using the default amd64 runner. Actually, we publish multi-platform images for every commit like this. Probably, we don't need arm64 runners for the multi-platform build.

Sure, But building an arm-image in amd64-runner would be much slower since it uses some kind of virtualizer like QEMU to build arm-image. So if we could prepare arm64 runner, then it would be better. However, if it is not affordable, then yes I agree with to build it with amd64-runner. @tenzen-y

@anencore94 I see. That's a great idea, I agree with your idea. It makes speed up building time if we could prepare arm64 runners. Maybe, docker build create --append command and remote build instance help us.

Using multiple native nodes provide better support for more complicated cases that are not handled by QEMU and generally have better performance. You can add additional nodes to the builder instance using the --append flag.

Assuming contexts node-amd64 and node-arm64 exist in docker context ls;

 docker buildx create --use --name mybuild node-amd64
 docker buildx create --append --name mybuild node-arm64
 docker buildx build --platform linux/amd64,linux/arm64 .

https://docs.docker.com/build/building/multi-platform/#building-multi-platform-images

The Buildx remote driver allows for more complex custom build workloads, allowing you to connect to externally managed BuildKit instances. This is useful for scenarios that require manual management of the BuildKit daemon, or where a BuildKit daemon is exposed from another source.

docker buildx create \
  --name remote-unix \
  --driver remote \
  unix://$HOME/buildkitd.sock

https://docs.docker.com/build/drivers/remote/

tenzen-y avatar Dec 19 '22 12:12 tenzen-y

@anencore94 @tenzen-y Currently, we are not using self hosted runners. We need to review this sometime if we can use self hosted runners in AWS

johnugeorge avatar Dec 21 '22 03:12 johnugeorge

I'm willing to help create the flows for the release. Do let me know if you guys need any help once we have some agreement on runners.

midhun1998 avatar Feb 26 '23 18:02 midhun1998

Hi @midhun1998, that would be great! Currently, we follow this manual process for our releases: https://github.com/kubeflow/katib/tree/master/docs/release. We can discuss how to automate it (e.g. using GitHub Actions) on the upcoming AutoML + Training WG Meeting.

andreyvelich avatar Feb 27 '23 14:02 andreyvelich

I'd like to contribute on this automation too. see you on the next meeting :)

anencore94 avatar Mar 01 '23 14:03 anencore94

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 23 '23 20:08 github-actions[bot]

/remove-lifecycle stale

tenzen-y avatar Aug 24 '23 13:08 tenzen-y

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 22 '23 15:11 github-actions[bot]

/lifecycle frozen /help

andreyvelich avatar Nov 22 '23 15:11 andreyvelich

@andreyvelich: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/lifecycle frozen /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Nov 22 '23 15:11 google-oss-prow[bot]

/good-first-issue

This is good issue to work on if you are familiar with GitHub actions and can help us to automate releases for Katib/Training Operator. Feel free to propose your ideas/suggestions.

andreyvelich avatar Mar 15 '24 19:03 andreyvelich

@andreyvelich: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

This is good issue to work on if you are familiar with GitHub actions and can help us to automate releases for Katib/Training Operator. Feel free to propose your ideas/suggestions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Mar 15 '24 19:03 google-oss-prow[bot]

/assign

xr-dev-saurabh avatar Mar 29 '24 12:03 xr-dev-saurabh