community
community copied to clipboard
Umbrella Issue: Porting Kubeflow to IBM Power (ppc64le)
/kind feature
Enable builds & releases for IBM Power (ppc64le architecture). This proposal was presented with these slides at the 2022-10-25 Kubeflow community call with positive community feedback. We also created this design documentation: https://docs.google.com/document/d/1nGUvLonahoLogfWCHsoUOZl-s77YtPEiCjWBVlZjJHo/edit?usp=sharing
Why you need this feature:
- Widen scope of possible on-premises deployments (vanilla Kubernetes & OpenShift on Power)
- More general independence regarding processor architecture (x86, ppc64le, arm, …)
- Unified container builds
Describe the solution you'd like:
- Upstreaming changes that allow to build Dockerfiles on multiple architecture (starting with x86 & ppc64le)
- Upstreaming CI integration for multi-arch builds (starting with x86 & ppc64le)
We currently plan to divide our efforts into multiply phases:
- low-hanging "easy" integrations where no or minor code changes are needed; excluding KFP; Kubeflow 1.7 release scope (✅ done),
- same as 1. but now including additional KServe components for model serving; Kubeflow 1.8 release scope,
- same as 1. but now including KFP; Kubeflow 1.9 release scope,
- more complex integrations where external dependencies to python wheels exist.
Below is a detailed overview of each required integration, including links to associated PRs if those already exist.
Phase 1 Integrations (Kubeflow 1.7 scope)
- [x] Poddefaults (Admission) Webhook: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6803 🚀 https://hub.docker.com/r/kubeflownotebookswg/poddefaults-webhook/tags
- [x] Central Dashboard: https://github.com/kubeflow/kubeflow/pull/6861, https://github.com/kubeflow/kubeflow/pull/6923 🚀 https://hub.docker.com/r/kubeflownotebookswg/centraldashboard/tags
- [x] Jupyter Web App: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6800 🚀 https://hub.docker.com/r/kubeflownotebookswg/jupyter-web-app/tags
- [x] KServe: Agent: https://github.com/kserve/kserve/pull/2476, https://github.com/kserve/kserve/pull/2549 🚀 https://hub.docker.com/r/kserve/agent/tags
- [x] KServe: Controller: https://github.com/kserve/kserve/pull/2476, https://github.com/kserve/kserve/pull/2550 🚀 https://hub.docker.com/r/kserve/kserve-controller/tags
- [x] KServe: Models Web App: https://github.com/kserve/models-web-app/pull/45, https://github.com/kserve/models-web-app/pull/55 🚀 https://hub.docker.com/r/kserve/models-web-app/tags
- [x] KServe: QPExt: https://github.com/kserve/kserve/pull/2604 🚀 https://hub.docker.com/r/kserve/qpext/tags
- [x] KServe: Router: https://github.com/kserve/kserve/pull/2605 🚀 https://hub.docker.com/r/kserve/router/tags
- [x] MPI Operator: https://github.com/kubeflow/mpi-operator/pull/489 🚀 https://hub.docker.com/r/mpioperator/mpi-operator/tags
- [x] Notebook Controller: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6771 🚀 https://hub.docker.com/r/kubeflownotebookswg/notebook-controller/tags
- [x] Profiles + KFAM: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6785, https://github.com/kubeflow/kubeflow/pull/6809 🚀 https://hub.docker.com/r/kubeflownotebookswg/profile-controller/tags 🚀 https://hub.docker.com/r/kubeflownotebookswg/kfam/tags
- [x] Tensorboard Controller: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6805 🚀 https://hub.docker.com/r/kubeflownotebookswg/notebook-controller/tags
- [x] Tensorboard Web App: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6810 🚀 https://hub.docker.com/r/kubeflownotebookswg/tensorboards-web-app/tags
- [x] Training Operator: https://github.com/kubeflow/training-operator/pull/1674, https://github.com/kubeflow/training-operator/pull/1692 🚀 https://hub.docker.com/r/kubeflow/training-operator/tags
- [x] Volumes Web App: https://github.com/kubeflow/kubeflow/pull/6650, https://github.com/kubeflow/kubeflow/pull/6811 🚀 https://hub.docker.com/r/kubeflownotebookswg/volumes-web-app/tags
Phase 2 Integrations (Kubeflow 1.9 scope)
- [ ] KServe: PMML Server
- [ ] KServe: AIX
- [ ] KServe: Alibi
- [ ] KServe: Art
- [ ] Triton Inference Server (external)
- [ ] Seldon: ML Server (external)
- [ ] PyTorch: TorchServe (external)
Phase 3 Integrations (Kubeflow 1.10 scope)
Note: KFP is currently blocked by https://github.com/kubeflow/pipelines/issues/8660 / https://github.com/GoogleCloudPlatform/oss-test-infra/issues/1972
- [ ] KFP: Application-CRD-Controller
- [ ] KFP: Argoexec
- [ ] KFP: Cache-Server
- [ ] KFP: Frontend
- [ ] KFP: Metadata Envoy
- [ ] KFP: Persistence Agent
- [ ] KFP: Scheduled Workflow
- [ ] KFP: Workflow Controller
- [ ] KFP: Viewer-CRD-Controller
- [ ] KServe: LGB Server: blocked by https://github.com/pyca/cryptography/issues/7723
- [ ] KServe: Paddle Server: blocked by https://github.com/pyca/cryptography/issues/7723
- [ ] KServe: SKLearn Server: blocked by https://github.com/pyca/cryptography/issues/7723
- [ ] KServe: XGB Server: blocked by https://github.com/pyca/cryptography/issues/7723
- [ ] Katib: controller, db-manager, ui
- [ ] Katib: file-metrics-collector
- [ ] Katib: tfevent-metrics-collector
- [ ] Katib: suggestion-hyperopt: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
- [ ] Katib: suggestion-chocolate
- [ ] Katib: suggestion-hyperband: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
- [ ] Katib: suggestion-skopt: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
- [ ] Katib: suggestion-goptuna
- [ ] Katib: suggestion-optuna: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
- [ ] Katib: suggestion-enas
- [ ] Katib: suggestion-darts: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
- [ ] Katib: suggestion-pbt: https://github.com/kubeflow/katib/pull/2262, https://github.com/kubeflow/katib/pull/2290
- [ ] Katib: earlystopping-medianstop: https://github.com/kubeflow/katib/pull/2290
Phase 4 Integrations (Post Kubeflow 1.11 scope)
- [ ] KFP: Api Server
- [ ] KFP: Metadata Writer
- [ ] KFP: Visualization Server
- [ ] ml-metadata (KFP wheel dep.): https://github.com/google/ml-metadata/pull/171
- [ ] KServe: Storage Initializer: blocked by https://github.com/pyca/cryptography/issues/7723
- [ ] ~~OIDC Auth (external): https://github.com/arrikto/oidc-authservice/issues/104; on-hold as potentially irrelevant as of Kubeflow v1.8 (https://github.com/kubeflow/manifests/issues/2469)~~