spark-operator
spark-operator copied to clipboard
[BUG] Spark-operator 1.12.3 crashes because of missing image
Description
Spark operator pod crashes with ImagePullBackOffError.
- [ X] ✋ I have searched the open/closed issues and my issue is not listed.
Reproduction Code [Required]
Steps to reproduce the behavior:
$ helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace
"spark-operator" has been added to your repositories
NAME: my-release
LAST DEPLOYED: Thu Apr 25 10:13:49 2024
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
my-release spark-operator 1 2024-04-25 10:24:26.291749107 +0200 CEST deployed spark-operator-1.2.13 v1beta2-1.4.4-3.5.0
$ kubectl get pods -n spark-operator
NAME READY STATUS RESTARTS AGE
my-release-spark-operator-5cbd8bb556-nr5nb 0/1 ErrImagePull 0 30s
$ kubectl describe pods -n spark-operator my-release-spark-operator-5cbd8bb556-nr5nb
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43s default-scheduler Successfully assigned spark-operator/my-release-spark-operator-5cbd8bb556-nr5nb to kind-control-plane
Normal BackOff 18s (x2 over 41s) kubelet Back-off pulling image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0"
Warning Failed 18s (x2 over 41s) kubelet Error: ImagePullBackOff
Normal Pulling 5s (x3 over 43s) kubelet Pulling image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0"
Warning Failed 4s (x3 over 41s) kubelet Failed to pull image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0": failed to resolve reference "docker.io/kubeflow/spark-operator:v1beta2-1.4.4-3.5.0": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning Failed 4s (x3 over 41s) kubelet Error: ErrImagePull
Expected behavior
Spark operator should start successfully
Actual behavior
Spark operator pod crashes with ImagePullBackOffError.
Environment & Versions
- Spark Operator App version: v1beta2-1.4.4-3.5.0
- Helm Chart Version: 1.12.3
- Kubernetes Version: 1.27.3
- Apache Spark version: Non applicable
try the fixes suggested in an earlier issue https://github.com/kubeflow/spark-operator/issues/1991
I tried to use --set 'image.repository=ghcr.io/googlecloudplatform/spark-operator'
as proposed in #1991 but it did not solve at all the current issue.
the last message in the thread --set image.repository=ghcr.io/kubeflow/spark-operator --set image.tag=v1beta2-1.4.3-3.5.0
works for me.
We just released a new image update with important registry fixes. Check it out:
Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0 Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14
Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.
Thanks!
This command works fine: helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --version 1.2.14
This issue has been automatically marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. Thank you for your contributions.