katib icon indicating copy to clipboard operation
katib copied to clipboard

updated dockerfiles for grpc builds for powerPC compilation~

Open pranavpandit1 opened this issue 1 year ago • 10 comments

This PR targets to get the katib/suggestions built on PowerPC.

pranavpandit1 avatar Feb 20 '24 14:02 pranavpandit1

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pranavpandit1 Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Feb 20 '24 14:02 google-oss-prow[bot]

Uhm, I'm wondering if we shouldn't support ppc64le env since the validation environment generally doesn't exist. @andreyvelich @johnugeorge WDYT?

tenzen-y avatar Mar 02 '24 17:03 tenzen-y

@tenzen-y - note that this PR is part of a greater effort to enable Kubeflow and all its dependencies for multiple processor architectures (we start with ppc64le): https://github.com/kubeflow/kubeflow/issues/6684

We do have an enterprise-supported ppc64le Kubeflow distribution (https://www.ibm.com/docs/en/announcements/rocket-ai-hub-power) and are ensuring validation / take care of upstream fixing if needed. I hope that helps...

lehrig avatar Mar 07 '24 09:03 lehrig

@tenzen-y - note that this PR is part of a greater effort to enable Kubeflow and all its dependencies for multiple processor architectures (we start with ppc64le): kubeflow/kubeflow#6684

We do have an enterprise-supported ppc64le Kubeflow distribution (https://www.ibm.com/docs/en/announcements/rocket-ai-hub-power) and are ensuring validation / take care of upstream fixing if needed. I hope that helps...

I see. I didn't know it. @kubeflow/wg-automl-leads Did you approve this ppc64le projects? If so, I'm ok with proceeding with this PR.

tenzen-y avatar Mar 09 '24 12:03 tenzen-y

Could you update CI as well? For example, we should update this platform

https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-algorithm-images.yaml#L15

https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-core-images.yaml#L13

@tenzen-y Actually we are adding ppc64le support for other Katib components as well which is in progress. So once we are in good position where all katib components's dockerfile have ppc64le support then we can proceed with updating CI.. WDYT?

aditijadhav38 avatar Mar 13 '24 14:03 aditijadhav38

Could you update CI as well? For example, we should update this platform https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-algorithm-images.yaml#L15

https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-core-images.yaml#L13

@tenzen-y Actually we are adding ppc64le support for other Katib components as well which is in progress. So once we are in good position where all katib components's dockerfile have ppc64le support then we can proceed with updating CI.. WDYT?

We shouldn't add any images without verifying in CI. It will cause us to lose maintainability and stability. I didn't mean that "I can not believe you", and my primary concern is the future maintainability.

tenzen-y avatar Mar 13 '24 14:03 tenzen-y

cc @kubeflow/wg-training-leads

andreyvelich avatar Mar 13 '24 17:03 andreyvelich

@andreyvelich, I have directly added a comment there: https://github.com/argoproj/argo-workflows/issues/12449#issuecomment-1997792858

What I describe there also holds true here - we are happy to help maintaining, especially if anything architecture-specific comes up. Hope that helps.

lehrig avatar Mar 14 '24 19:03 lehrig

@tenzen-y I have raised new PR https://github.com/kubeflow/katib/pull/2290 with the suggested changes and I've also updated one more dockerfile earlystopping/medianstop which needs similar changes for grpc installation. Also as per your suggestion I have updated CI files as well. But please note that we haven't added support for all Katib components, hence for few images (for which dockerfiles are not yet updated for eg. nas/enas, katib-ui, katib/tfevent-metricscollector) the build might fail. We are currently working on remaining components mentioned above.

aditijadhav38 avatar Mar 19 '24 07:03 aditijadhav38