Cannot deploy to k8s on AArch64 nodes using manifests in repo
Describe the bug
I'm unable to deploy ModelRegistry to k8s on my apple silicon MacBook when following the official instructions here. This appears to be due to the mysql container image (mysql:8.0.3) defined in the db overlay kustomization only being compatible with amd64 architecture.
To Reproduce Steps to reproduce the behavior:
- Deploy to a k8s cluster running on AArch64 (in my case I used a kind cluster on an apple silicon mac) by running:
kubectl apply -k "https://github.com/kubeflow/model-registry/manifests/kustomize/overlays/db?ref=v0.2.3-alpha" - Run:
kubectl get pods -n kubeflow - Observe that the pod is in a CrashLoopBackoff
- run
kubectl logs model-registry-db-xxxxxx(where xxxx is found from the output of the above command) - Observe an error message similar to:
Error from server (BadRequest): container "db-container" in pod "model-registry-db-7c84c4cfc8-cn4cx" is waiting to start: trying and failing to pull image
Expected behavior DB pod should not get into a CrashLoopBackoff and should be able to pull the image.
Additional context
I've tried changing the version in the manifest to mysql:8.0.39 - the closest version with an AArch64 image and I'm able to get MR running as expected. Would you consider upgrading the mysql image in the manifests to allow ModelRegistry to be run on macs with apple silicon?
As it's only a one line change, I've created a PR for this here: https://github.com/kubeflow/model-registry/pull/267
Obviously, this may not be an appropriate change, so no hard feelings if it needs to be closed!
This is a symptom of a broader issue that you cannot install Kubeflow on a KinD/Minikube on Mac, and this is impacting Model Registry because the Google's MLMD image is not available except for x86, and Model Registry wraps MLMD by design, here:
https://github.com/kubeflow/model-registry/blob/feeb0dcd8315826ddbfbbd97d4dbd764e9eb3a27/manifests/kustomize/base/model-registry-deployment.yaml#L54
Here are some relevant discussions in KF community:
- https://github.com/kubeflow/pipelines/issues/10309
- https://github.com/kubeflow/manifests/issues/2745#issuecomment-2189135232
See also:
- https://github.com/google/ml-metadata/issues/143#issuecomment-1860066218
- https://github.com/google/ml-metadata/pull/188
- https://github.com/kubeflow/pipelines/issues/10309
- https://github.com/google/ml-metadata/pull/166
- https://github.com/google/ml-metadata/issues/190
If you are looking for a way to develop with Model Registry locally on Mac, I recommend using the docker-compose in the root of the repository. I personally use it with Podman Desktop and works; happy to share tips if you encounter issues. That works because at that point you can leverage the Rosetta emulation for the container with the podman machine.
Thanks for the references, it's useful to understand the ongoing discussions around KubeFlow / MLMD etc.
I'm specifically looking to develop against model registry deployed standalone (i.e. without kubeflow) on k8s. This is necessary for the UI BFF work as I need to able to access the service, retrieve labels, annotations etc.
I found that by merely committing a slight version bump to the MySql image (i.e. to the closest version with an aarch64 image) I'm able to deploy locally to kind without issue in this manner.
I'm using the kustomize config on my own fork for now, which is no real hardship for me, but it would be useful to be able to write a guide for some other developers I'd like to onboard to this workflow without them having to use a manifest on my fork if possible.
This is a symptom of a broader issue that you cannot install Kubeflow on a KinD/Minikube on Mac, and this is impacting Model Registry because the Google's MLMD image is not available except for x86, and Model Registry wraps MLMD by design
I would treat this as separate issue in general with MLMD. Maybe we running into end of rope with MLMD and its idiosyncrasies we need to decide soon if we want get out of it before we move away from "alpha" release and simplify the architecture.
I know this will alienate the KFP team from integration with Model Registry which was original purpose at first hand why we choose the MLMD. I really do think we need to do a quick study with KFP team to really access the footprint of MLMD they are using to see what it takes to replace MLMD or drop the aspiration to support KFP and choose our own path.
BTW, what I am suggesting is to keep the schema side of MLMD but bring the DB access directly into the Model Registry REST server and remove additional container for it. We can consider new explicit layer to integrate with KFP, rather than what was our original intension implicitly weave into Model Registry. wdyt? @tarilabs @dhirajsb @rimolive
I'm able to deploy locally to kind without issue in this manner.
@alexcreasy this is interesting, can't recall if I ever tried with KinD. So can you kindly confirm you are using:
- a M-chip Mac
- Mac OSX
- Podman
- KinD
and then the only issue for you was allegedly the MySQL version?
@tarilabs Yes, that's right, I'm using:
- Macbook Pro 14" with m3 pro cpu
- macOS Sonoma 14.6.1
- Podman desktop v1.12.0 (Podman v5.2.0)
- KinD 0.23.0
MR is currently using the 8.0.3 MySQL image -- you can see there's no arm64 build: https://hub.docker.com/layers/library/mysql/8.3/images/sha256-f9097d95a4ba5451fff79f4110ea6d750ac17ca08840f1190a73320b84ca4c62?context=explore
The latest 8.0.x image does have one: https://hub.docker.com/layers/library/mysql/8.0.39/images/sha256-7b4902b99989615deaa12a3af4e32f21e9b32a862d6856d121dd44ca71c166ed?context=explore
I haven't done any deep testing yet, I'm still finding my way around using the project, but to confirm, I was able to deploy MR to KinD locally on this MacBook without any errors showing on the deployed pods. I was then able to smoke test by curl the endpoint shown in the getting started guide and received a 200 response.
this is awesome to hear, thank you @alexcreasy ; I believe I tried only similar combination before podman wired Rosetta, so I'm happy to hear nowadays it gets a lot simpler
I want to augment the comment in https://github.com/kubeflow/model-registry/pull/267#pullrequestreview-2244881693 with the information that the default vanilla installation uses a custom image of mysql from google container registry:
https://github.com/kubeflow/manifests/blob/a38c2be88fbafb0844c0231f0062e4b3719d4737/apps/pipeline/upstream/third-party/mysql/base/mysql-deployment.yaml#L51
👉
gcr.io/ml-pipeline/mysql:8.0.26
(source)
I ran a successful experiment on a fresh M3 Mac and then defined a procedure here, HTH
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I just submitted a PR that should fix this issue. https://github.com/kubeflow/model-registry/pull/703
Long story short. It looks like mysql:8.0.3 may have been confused with 8.3.0 on accident given the example outputs in the README.md. 8.0.3 is 7 years old.
Hi, this is a note of record that regardless we didn't receive any feedback since on enquiry
- https://github.com/kubeflow/pipelines/discussions/11224
despite also following up with Liaisons and community in KF Release meetings, etc., I'm proposing to progress further on merge of
- https://github.com/kubeflow/model-registry/pull/267
in order also to best support contributor development from Mac/ARM.
So to merge #267 to have refreshed dependency images, and help with local dev.
If you have any concern, please raise it by latest KF MR biweekly meeting currently scheduled for 2025-02-03.
ps.: after #267, we can also consider #703 but I would prefer to keep fifo
we still intend to merge #703 in a subsequent MR release(S)