Switch to GHCR due to docker.io pull rate limits
Validation Checklist
- [x] I confirm that this is a Kubeflow-related issue.
- [x] I am reporting this in the appropriate repository.
- [x] I have followed the Kubeflow installation guidelines.
- [x] The issue report is detailed and includes version numbers where applicable.
- [x] This issue pertains to Kubeflow development.
- [x] I am available to work on this issue.
- [x] You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.
Version
master
Detailed Description
DckerHub seems to beending unauthenticated pulls from March 1, 2025.
We probably need to migrate all docker images to GHCR as soon as possible, possibly before 1.10 final is cut.
https://docs.docker.com/docker-hub/usage/pulls/
Steps to Reproduce
Pull too many images
Screenshots or Videos (Optional)
No response
- [ ] KFP
- [ ] Katib
- [ ] Manifests/platform
- [ ] trainer
- [ ] kserve
- [ ] model registry
Some of them are already on GHCR according to the maintainers.
https://github.com/kubeflow/manifests/blob/master/hack/trivy_scan.py can give us all images. I'm each commit to master
@rimolive @tarekabouzeid
Here is the list
busybox:1.28
docker.io/istio/pilot:1.24.2
docker.io/istio/proxyv2:1.24.2
docker.io/kubeflow/model-registry-ui:v0.2.14
docker.io/kubeflowkatib/earlystopping-medianstop:v0.18.0-rc.0
docker.io/kubeflowkatib/enas-cnn-cifar10-cpu:v0.18.0-rc.0
docker.io/kubeflowkatib/file-metrics-collector:v0.18.0-rc.0
docker.io/kubeflowkatib/katib-controller:v0.18.0-rc.0
docker.io/kubeflowkatib/katib-db-manager:v0.18.0-rc.0
docker.io/kubeflowkatib/katib-ui:v0.18.0-rc.0
docker.io/kubeflowkatib/pytorch-mnist-cpu:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-darts:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-enas:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-goptuna:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-hyperband:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-hyperopt:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-optuna:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-pbt:v0.18.0-rc.0
docker.io/kubeflowkatib/suggestion-skopt:v0.18.0-rc.0
docker.io/kubeflowkatib/tfevent-metrics-collector:v0.18.0-rc.0
docker.io/kubeflownotebookswg/centraldashboard:v1.10.0-rc.1
docker.io/kubeflownotebookswg/jupyter-web-app:v1.10.0-rc.1
docker.io/kubeflownotebookswg/kfam:v1.10.0-rc.1
docker.io/kubeflownotebookswg/notebook-controller:v1.10.0-rc.1
docker.io/kubeflownotebookswg/poddefaults-webhook:v1.10.0-rc.1
docker.io/kubeflownotebookswg/profile-controller:v1.10.0-rc.1
docker.io/kubeflownotebookswg/pvcviewer-controller:v1.10.0-rc.1
docker.io/kubeflownotebookswg/tensorboard-controller:v1.10.0-rc.1
docker.io/kubeflownotebookswg/tensorboards-web-app:v1.10.0-rc.1
docker.io/kubeflownotebookswg/volumes-web-app:v1.10.0-rc.1
docker.io/seldonio/mlserver:1.5.0
gcr.io/cloudsql-docker/gce-proxy:1.25.0
gcr.io/knative-releases/knative.dev/net-istio/cmd/controller@sha256:e70bc675f97778da144157f125b3001124ba7a5903b85dab9e77776352fea1c7
gcr.io/knative-releases/knative.dev/net-istio/cmd/webhook@sha256:7d76a6d42d139ed53aae3ca2dfd600b1c776eb85a17af64dd1b604176a4b132a
gcr.io/knative-releases/knative.dev/serving/cmd/activator@sha256:24c19cbee078925b91cd2e85082b581d53b218b410c083b1005dc06dc549b1d3
gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler@sha256:5e9236452d89363957d4e7e249d57740a8fcd946aed23f8518d94962bf440250
gcr.io/knative-releases/knative.dev/serving/cmd/controller@sha256:5fb22b052e6bc98a1a6bbb68c0282ddb50744702acee6d83110302bc990666e9
gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:c61042001b1f21c5d06bdee9b42b5e4524e4370e09d4f46347226f06db29ba0f
gcr.io/knative-releases/knative.dev/serving/cmd/webhook@sha256:0fb5a4245aa4737d443658754464cd0a076de959fe14623fb9e9d31318ccce24
gcr.io/ml-pipeline/application-crd-controller:20231101
gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance
gcr.io/ml-pipeline/mysql:8.0.26
gcr.io/ml-pipeline/workflow-controller:v3.4.17-license-compliance
gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/controller:v0.53.2@sha256:2cab05747826e7c32e2c588f0fefd354e03f643bd33dbe20533eada00562e6b1
gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/events:v0.53.2@sha256:0cf6f0be5319efdd8909ed8f987837d89146fd0632a744bf6d54bf83e5b13ca0
gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/resolvers:v0.53.2@sha256:6578d145acd9cd288e501023429439334de15de8bd77af132c57a1d5f982e940
gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/webhook:v0.53.2@sha256:1e8f8be3b51be378747b4589dde970582f50e1e69f59527f0a9aa7a75c5833e3
gcr.io/tfx-oss-public/ml_metadata_store_server:1.14.0
ghcr.io/dexidp/dex:v2.41.1
ghcr.io/kubeflow/kfp-api-server:2.4.0
ghcr.io/kubeflow/kfp-cache-deployer:2.4.0
ghcr.io/kubeflow/kfp-cache-server:2.4.0
ghcr.io/kubeflow/kfp-frontend:2.4.0
ghcr.io/kubeflow/kfp-inverse-proxy-agent:2.4.0
ghcr.io/kubeflow/kfp-metadata-envoy:2.4.0
ghcr.io/kubeflow/kfp-metadata-writer:2.4.0
ghcr.io/kubeflow/kfp-persistence-agent:2.4.0
ghcr.io/kubeflow/kfp-scheduled-workflow-controller:2.4.0
ghcr.io/kubeflow/kfp-viewer-crd-controller:2.4.0
ghcr.io/kubeflow/kfp-visualization-server:2.4.0
ghcr.io/metacontroller/metacontroller:v2.6.1
kserve/huggingfaceserver:v0.14.1
kserve/kserve-controller:v0.14.1
kserve/kserve-localmodel-controller:v0.14.1
kserve/kserve-localmodelnode-agent:v0.14.1
kserve/lgbserver:v0.14.1
kserve/models-web-app:v0.14.0-rc.0
kserve/paddleserver:v0.14.1
kserve/pmmlserver:v0.14.1
kserve/sklearnserver:v0.14.1
kserve/storage-initializer:v0.14.1
kserve/xgbserver:v0.14.1
kubeflow/model-registry-storage-initializer:latest
kubeflow/model-registry:v0.2.14
kubeflow/training-operator:v1-5170a36
mysql:8.0.29
mysql:8.0.3
mysql:8.0.39
nvcr.io/nvidia/tritonserver:23.05-py3
postgres:14.5-alpine
postgres:14.7-alpine3.17
python:3.9
pytorch/torchserve-kfs:0.9.0
quay.io/aipipeline/pipelineloop-controller:1.9.2
quay.io/aipipeline/pipelineloop-webhook:1.9.2
quay.io/aipipeline/tekton-exithandler-controller:2.0.5
quay.io/aipipeline/tekton-exithandler-webhook:2.0.5
quay.io/aipipeline/tekton-kfptask-controller:2.0.5
quay.io/aipipeline/tekton-kfptask-webhook:2.0.5
quay.io/brancz/kube-rbac-proxy:v0.13.1
quay.io/brancz/kube-rbac-proxy:v0.18.0
quay.io/brancz/kube-rbac-proxy:v0.8.0
tensorflow/serving:2.6.2
I checked the manifests manually for OCI images hosted on Dockerhub that will then probably break for many users in March
- [x] Istio: image: busybox:1.28 we should use registry.k8s.io/busybox as KFP does @juliusvonkohout
- [x] Istio: docker.io/istio/proxyv2:1.24.2 and docker.io/istio/pilot:1.24.2 We can probaly use https://console.cloud.google.com/artifacts/docker/istio-release and https://console.cloud.google.com/artifacts/docker/istio-release/us/gcr.io/pilot?inv=1&invt=Abqa_w via
istioctl profile dump default --set global.hub=gcr.io/istio-release > profile.yamlin https://github.com/kubeflow/manifests/blob/master/common/istio-1-24/README.md and the CNI version @juliusvonkohout @tarekabouzeid @akagami-harsh - [x] python:3.9 for PPC should anyway be updated to 3.12 in KFP multitenancy https://github.com/kubeflow/pipelines/pull/11669 @juliusvonkohout @hbelmiro @HumairAK
- [x] all kubeflownotebookswg/ images @thesuperzapper
- [ ] tensorboard tensorflow/tensorflow:2.5.1 @thesuperzapper
- [ ] VOLUME_VIEWER_IMAGE filebrowser/filebrowser:v2.25.0 @thesuperzapper
- [ ] docker.io/kubeflowkatib/ images and mysql:8.0.29 @andreyvelich @Electronic-Waste You can use the image that KFP uses gcr.io/ml-pipeline/mysql:8.0.26 and ask them to update it @hbelmiro @HumairAK
- [ ] kubeflow/training-operator:v1-5170a36 @andreyvelich @Electronic-Waste
- [ ] Spark @juliusvonkohout @vikas-saxena02
- [ ] kserve kserve/ images and docker.io/seldonio/mlserver, tensorflow/serving, pytorch/torchserve-kfs @yuzisun @biswassri
@kubeflow/kubeflow-steering-committee
According to @thesuperzapper they shifted the deadline to April first.
@juliusvonkohout here's the official doc https://docs.docker.com/docker-hub/usage/ in case this helps.
"Starting April 1, 2025, all users with a Pro, Team, or Business subscription will have unlimited Docker Hub pulls with fair use. Unauthenticated users and users with a free Personal account have the following pull limits:
Unauthenticated users: 10 pulls/hour Authenticated users with a free account: 100 pulls/hour"
I recommend the ECR mirror as replacements for the docker library, because this way we regularly get security updates for base images such as public.ecr.aws/docker/library/python:3.12, all self-build ones we can push to ghcr.
See also https://gallery.ecr.aws/docker/
There is quite some progres in https://github.com/kubeflow/manifests/issues/3010#issuecomment-2677977953
Work in progress for sparkoperator... i will be raising the PR there in a day or 2
raised kubeflow/spark-operator#2483 . Corresponding issue in spark-operator repo is kubeflow/spark-operator#2480
I have added hold label to it as I am waiting for the maintainers of spark-operator to confirm the steps to test the change
@juliusvonkohout spark-operator has been taken care of.
@juliusvonkohout spark-operator has been taken care of.
do they have it in the latest release that we can synchronize? you can also do that with the sripts under /scripts as soon as there is a release.
@juliusvonkohout spark-operator has been taken care of.
do they have it in the latest release that we can synchronize? you can also do that with the sripts under /scripts as soon as there is a release.
@juliusvonkohout I will have to check with them.
To ALL COMPONENT MAINTAINERS @andyatmiami created a script to mirror tags from docker.io to ghcr.io which we used to migrate the old vX.X.X tags of all the kubeflow/kubeflow images.
I suggest that we use this script to mirror the release tags of other components too:
- It ensures we have full history of the release tags on GHCR
- It lets users who cant upgrade immediately (or need to use an old version), access the images on GHCR
PS: If people need, I have a business DockerHub account that I can use to avoid getting rate-limited while getting the old tags, but as I only have write access on the GHCR images that Notebooks WG owns, so it might be hard for me to push.
I recommend the ECR mirror as replacements for the docker library, because this way we regularly get security updates for base images such as public.ecr.aws/docker/library/python:3.12, all self-build ones we can push to ghcr.
It is not recommended to use public.ecr.aws mirrors, due to the special authentication method of aws, third-party companies can not deploy their own intranet cache based on aws mirrors, which may lead to difficulties in maintaining subsequent kubeflow versions. https://github.com/distribution/distribution/issues/4252 https://github.com/distribution/distribution/issues/4383
I recommend the ECR mirror as replacements for the docker library, because this way we regularly get security updates for base images such as public.ecr.aws/docker/library/python:3.12, all self-build ones we can push to ghcr.
It is not recommended to use public.ecr.aws mirrors, due to the special authentication method of aws, third-party companies can not deploy their own intranet cache based on aws mirrors, which may lead to difficulties in maintaining subsequent kubeflow versions. distribution/distribution#4252 distribution/distribution#4383
Do you have an alternative?
current OCI images on Dockerhub:
busybox:1.28 (Maybe Istio or KFP)
mysql:8.0.29 (KFP + Katib)
mysql:8.3.0 (KFP + Katib)
postgres:14.5-alpine (KFP)
postgres:14.7-alpine3.17 (KFP)
pytorch/torchserve-kfs:0.9.0 (Kserve)
tensorflow/serving:2.6.2 (Kserve)
docker.io/seldonio/mlserver:1.5.0
kserve/huggingfaceserver:v0.15.0
kserve/huggingfaceserver:v0.15.0-gpu
kserve/kserve-controller:v0.15.0
kserve/kserve-localmodel-controller:v0.15.0
kserve/lgbserver:v0.15.0
kserve/paddleserver:v0.15.0
kserve/pmmlserver:v0.15.0
kserve/sklearnserver:v0.15.0
kserve/storage-initializer:v0.15.0
kserve/xgbserver:v0.15.0
@vikas-saxena02 @biswassri @terrytangyuan what is the status with Kserve? Some of the kserve images also have massive CVEs see https://github.com/kubeflow/manifests/actions/runs/15414880021/job/43375235285 which are probably relevant for Kserves graduation. The main offenders are nvcr.io/nvidia/tritonserver:23.05-py3 and kserve/huggingfaceserver:v0.15.0-gpu.
@vikas-saxena02 @biswassri can you check where busybox, postgres and mysql comes from upstream?
{
"data": [
{
"image": "kserve/storage-initializer:v0.15.0",
"severity_counts": {
"LOW": 74,
"MEDIUM": 33,
"HIGH": 7,
"CRITICAL": 2
}
},
{
"image": "kserve/paddleserver:v0.15.0",
"severity_counts": {
"LOW": 76,
"MEDIUM": 33,
"HIGH": 8,
"CRITICAL": 2
}
},
{
"image": "kserve/huggingfaceserver:v0.15.0-gpu",
"severity_counts": {
"LOW": 168,
"MEDIUM": 1202,
"HIGH": 11,
"CRITICAL": 4
}
},
{
"image": "pytorch/torchserve-kfs:0.9.0",
"severity_counts": {
"LOW": 233,
"MEDIUM": 1683,
"HIGH": 97,
"CRITICAL": 8
}
},
{
"image": "ghcr.io/kserve/models-web-app:v0.14.0",
"severity_counts": {
"LOW": 74,
"MEDIUM": 36,
"HIGH": 5,
"CRITICAL": 1
}
},
{
"image": "kserve/sklearnserver:v0.15.0",
"severity_counts": {
"LOW": 74,
"MEDIUM": 33,
"HIGH": 7,
"CRITICAL": 2
}
},
{
"image": "docker.io/seldonio/mlserver:1.5.0",
"severity_counts": {
"LOW": 88,
"MEDIUM": 123,
"HIGH": 50,
"CRITICAL": 2
}
},
{
"image": "nvcr.io/nvidia/tritonserver:23.05-py3",
"severity_counts": {
"LOW": 526,
"MEDIUM": 3556,
"HIGH": 123,
"CRITICAL": 0
}
},
{
"image": "quay.io/brancz/kube-rbac-proxy:v0.18.0",
"severity_counts": {
"LOW": 1,
"MEDIUM": 3,
"HIGH": 1,
"CRITICAL": 1
}
},
{
"image": "kserve/pmmlserver:v0.15.0",
"severity_counts": {
"LOW": 86,
"MEDIUM": 90,
"HIGH": 27,
"CRITICAL": 2
}
},
{
"image": "kserve/xgbserver:v0.15.0",
"severity_counts": {
"LOW": 76,
"MEDIUM": 33,
"HIGH": 7,
"CRITICAL": 2
}
},
{
"image": "kserve/lgbserver:v0.15.0",
"severity_counts": {
"LOW": 76,
"MEDIUM": 33,
"HIGH": 8,
"CRITICAL": 2
}
},
{
"image": "tensorflow/serving:2.6.2",
"severity_counts": {
"LOW": 61,
"MEDIUM": 40,
"HIGH": 4,
"CRITICAL": 0
}
},
{
"image": "kserve/huggingfaceserver:v0.15.0",
"severity_counts": {
"LOW": 141,
"MEDIUM": 1198,
"HIGH": 11,
"CRITICAL": 4
}
},
{
"image": "kserve/kserve-controller:v0.15.0",
"severity_counts": {
"LOW": 0,
"MEDIUM": 2,
"HIGH": 1,
"CRITICAL": 0
}
},
{
"image": "kserve/kserve-localmodel-controller:v0.15.0",
"severity_counts": {
"LOW": 0,
"MEDIUM": 2,
"HIGH": 0,
"CRITICAL": 0
}
}
]
}
what is the status with Kserve? Some of the kserve images also have massive CVEs
@juliusvonkohout I have a PR with all the changes in. I have one lgtm. Just waiting for the team's final review on it. I'll take care of the CVEs in a separate PR for the impacting images.