community Umbrella Issue: Porting Kubeflow to IBM Power (ppc64le)

Umbrella Issue: Porting Kubeflow to IBM Power (ppc64le)

Open lehrig opened this issue 2 years ago • 19 comments

/kind feature

Enable builds & releases for IBM Power (ppc64le architecture). This proposal was presented with these slides at the 2022-10-25 Kubeflow community call with positive community feedback. We also created this design documentation: https://docs.google.com/document/d/1nGUvLonahoLogfWCHsoUOZl-s77YtPEiCjWBVlZjJHo/edit?usp=sharing

Why you need this feature:

Widen scope of possible on-premises deployments (vanilla Kubernetes & OpenShift on Power)
More general independence regarding processor architecture (x86, ppc64le, arm, …)
Unified container builds

Describe the solution you'd like:

Upstreaming changes that allow to build Dockerfiles on multiple architecture (starting with x86 & ppc64le)
Upstreaming CI integration for multi-arch builds (starting with x86 & ppc64le)

We currently plan to divide our efforts into multiply phases:

low-hanging "easy" integrations where no or minor code changes are needed; excluding KFP; Kubeflow 1.7 release scope (✅ done),
same as 1. but now including additional KServe components for model serving; Kubeflow 1.8 release scope,
same as 1. but now including KFP; Kubeflow 1.9 release scope,
more complex integrations where external dependencies to python wheels exist.

Below is a detailed overview of each required integration, including links to associated PRs if those already exist.

Phase 1 Integrations (Kubeflow 1.7 scope)

[x] Poddefaults (Admission) Webhook: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6803 🚀 https://hub.docker.com/r/kubeflownotebookswg/poddefaults-webhook/tags
[x] Central Dashboard: https://github.com/kubeflow/kubeflow/pull/6861, https://github.com/kubeflow/kubeflow/pull/6923 🚀 https://hub.docker.com/r/kubeflownotebookswg/centraldashboard/tags
[x] Jupyter Web App: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6800 🚀 https://hub.docker.com/r/kubeflownotebookswg/jupyter-web-app/tags
[x] KServe: Agent: https://github.com/kserve/kserve/pull/2476, https://github.com/kserve/kserve/pull/2549 🚀 https://hub.docker.com/r/kserve/agent/tags
[x] KServe: Controller: https://github.com/kserve/kserve/pull/2476, https://github.com/kserve/kserve/pull/2550 🚀 https://hub.docker.com/r/kserve/kserve-controller/tags
[x] KServe: Models Web App: https://github.com/kserve/models-web-app/pull/45, https://github.com/kserve/models-web-app/pull/55 🚀 https://hub.docker.com/r/kserve/models-web-app/tags
[x] KServe: QPExt: https://github.com/kserve/kserve/pull/2604 🚀 https://hub.docker.com/r/kserve/qpext/tags
[x] KServe: Router: https://github.com/kserve/kserve/pull/2605 🚀 https://hub.docker.com/r/kserve/router/tags
[x] MPI Operator: https://github.com/kubeflow/mpi-operator/pull/489 🚀 https://hub.docker.com/r/mpioperator/mpi-operator/tags
[x] Notebook Controller: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6771 🚀 https://hub.docker.com/r/kubeflownotebookswg/notebook-controller/tags
[x] Profiles + KFAM: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6785, https://github.com/kubeflow/kubeflow/pull/6809 🚀 https://hub.docker.com/r/kubeflownotebookswg/profile-controller/tags 🚀 https://hub.docker.com/r/kubeflownotebookswg/kfam/tags
[x] Tensorboard Controller: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6805 🚀 https://hub.docker.com/r/kubeflownotebookswg/notebook-controller/tags
[x] Tensorboard Web App: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6810 🚀 https://hub.docker.com/r/kubeflownotebookswg/tensorboards-web-app/tags
[x] Training Operator: https://github.com/kubeflow/training-operator/pull/1674, https://github.com/kubeflow/training-operator/pull/1692 🚀 https://hub.docker.com/r/kubeflow/training-operator/tags
[x] Volumes Web App: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6811 🚀 https://hub.docker.com/r/kubeflownotebookswg/volumes-web-app/tags

Phase 2 Integrations (Kubeflow 1.9 scope)

[ ] KServe: PMML Server
[ ] KServe: AIX
[ ] KServe: Alibi
[ ] KServe: Art
[ ] Triton Inference Server (external)
[ ] Seldon: ML Server (external)
[ ] PyTorch: TorchServe (external)

Phase 3 Integrations (Kubeflow 1.10 scope)

Note: KFP is currently blocked by https://github.com/kubeflow/pipelines/issues/8660 / https://github.com/GoogleCloudPlatform/oss-test-infra/issues/1972

[ ] KFP: Application-CRD-Controller
[ ] KFP: Argoexec
[ ] KFP: Cache-Server
[ ] KFP: Frontend
[ ] KFP: Metadata Envoy
[ ] KFP: Persistence Agent
[ ] KFP: Scheduled Workflow
[ ] KFP: Workflow Controller
[ ] KFP: Viewer-CRD-Controller
[ ] KServe: LGB Server: blocked by https://github.com/pyca/cryptography/issues/7723
[ ] KServe: Paddle Server: blocked by https://github.com/pyca/cryptography/issues/7723
[ ] KServe: SKLearn Server: blocked by https://github.com/pyca/cryptography/issues/7723
[ ] KServe: XGB Server: blocked by https://github.com/pyca/cryptography/issues/7723
[ ] Katib: controller, db-manager, ui
[ ] Katib: file-metrics-collector
[ ] Katib: tfevent-metrics-collector
[ ] Katib: suggestion-hyperopt: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
[ ] Katib: suggestion-chocolate
[ ] Katib: suggestion-hyperband: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
[ ] Katib: suggestion-skopt: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
[ ] Katib: suggestion-goptuna
[ ] Katib: suggestion-optuna: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
[ ] Katib: suggestion-enas
[ ] Katib: suggestion-darts: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
[ ] Katib: suggestion-pbt: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
[ ] Katib: earlystopping-medianstop: https://github.com/kubeflow/katib/pull/2290

Phase 4 Integrations (Post Kubeflow 1.11 scope)

[ ] KFP: Api Server
[ ] KFP: Metadata Writer
[ ] KFP: Visualization Server
[ ] ml-metadata (KFP wheel dep.): https://github.com/google/ml-metadata/pull/171
[ ] KServe: Storage Initializer: blocked by https://github.com/pyca/cryptography/issues/7723
[ ] ~~OIDC Auth (external): https://github.com/arrikto/oidc-authservice/issues/104; on-hold as potentially irrelevant as of Kubeflow v1.8 (https://github.com/kubeflow/manifests/issues/2469)~~

Oct 25 '22 18:10 lehrig

community community copied to clipboard

Umbrella Issue: Porting Kubeflow to IBM Power (ppc64le)

community
community copied to clipboard