katib
katib copied to clipboard
updated dockerfiles for grpc builds for powerPC compilation~
This PR targets to get the katib/suggestions built on PowerPC.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: pranavpandit1 Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
Uhm, I'm wondering if we shouldn't support ppc64le env since the validation environment generally doesn't exist. @andreyvelich @johnugeorge WDYT?
@tenzen-y - note that this PR is part of a greater effort to enable Kubeflow and all its dependencies for multiple processor architectures (we start with ppc64le): https://github.com/kubeflow/kubeflow/issues/6684
We do have an enterprise-supported ppc64le Kubeflow distribution (https://www.ibm.com/docs/en/announcements/rocket-ai-hub-power) and are ensuring validation / take care of upstream fixing if needed. I hope that helps...
@tenzen-y - note that this PR is part of a greater effort to enable Kubeflow and all its dependencies for multiple processor architectures (we start with ppc64le): kubeflow/kubeflow#6684
We do have an enterprise-supported ppc64le Kubeflow distribution (https://www.ibm.com/docs/en/announcements/rocket-ai-hub-power) and are ensuring validation / take care of upstream fixing if needed. I hope that helps...
I see. I didn't know it. @kubeflow/wg-automl-leads Did you approve this ppc64le projects? If so, I'm ok with proceeding with this PR.
Could you update CI as well? For example, we should update this platform
https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-algorithm-images.yaml#L15
https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-core-images.yaml#L13
@tenzen-y Actually we are adding ppc64le support for other Katib components as well which is in progress. So once we are in good position where all katib components's dockerfile have ppc64le support then we can proceed with updating CI.. WDYT?
Could you update CI as well? For example, we should update this platform https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-algorithm-images.yaml#L15
https://github.com/kubeflow/katib/blob/a2f3fcae55d608a850eaea7ff3d25c667d2423e1/.github/workflows/publish-core-images.yaml#L13
@tenzen-y Actually we are adding ppc64le support for other Katib components as well which is in progress. So once we are in good position where all katib components's dockerfile have ppc64le support then we can proceed with updating CI.. WDYT?
We shouldn't add any images without verifying in CI. It will cause us to lose maintainability and stability. I didn't mean that "I can not believe you", and my primary concern is the future maintainability.
cc @kubeflow/wg-training-leads
- What are your thoughts on this: Add support for ppc64le argoproj/argo-workflows#12449. How we can ensure that we have enough maintenance support and user stories around using Kubeflow with PowerPC?
@andreyvelich, I have directly added a comment there: https://github.com/argoproj/argo-workflows/issues/12449#issuecomment-1997792858
What I describe there also holds true here - we are happy to help maintaining, especially if anything architecture-specific comes up. Hope that helps.
@tenzen-y I have raised new PR https://github.com/kubeflow/katib/pull/2290 with the suggested changes and I've also updated one more dockerfile earlystopping/medianstop which needs similar changes for grpc installation. Also as per your suggestion I have updated CI files as well. But please note that we haven't added support for all Katib components, hence for few images (for which dockerfiles are not yet updated for eg. nas/enas, katib-ui, katib/tfevent-metricscollector) the build might fail. We are currently working on remaining components mentioned above.