spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

[BUG] Failed to pull image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1

Open vzhao12 opened this issue 10 months ago • 9 comments

Description

Unable to Start spark job in kubenetes

  • [*] ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior:

  1. Set up a new kubenetes cluster. I set up one in gcloud.
  2. Get kubenetes cluster config
  3. helm repo add spark-operator https://kubeflow.github.io/spark-operator
  4. helm install spark-operator spark-operator/spark-operator
    --namespace default
    --set 'image.tag=v1beta2-1.3.3-3.1.1'
    --set sparkJobNamespace=default

Expected behavior

Spin up the spark operator pod.

Actual behavior

Pod failed because of ImagePullBackOff

Saw the following error.

Failed to pull image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": rpc error: code = NotFound desc = failed to pull and unpack image "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": failed to resolve reference "ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1": ghcr.io/kubeflow/spark-operator:v1beta2-1.3.3-3.1.1: not found

The errors start at 04/13/2024 1:00 AM

Terminal Output Screenshot(s)

Screenshot 2024-04-17 at 3 14 30 PM Screenshot 2024-04-17 at 3 14 38 PM

Environment & Versions

  • Spark Operator App version:3.1.1
  • Helm Chart Version: v3.12.3
  • Kubernetes Version: v1.28.7-gke.1026000
  • Apache Spark version:

Additional context

vzhao12 avatar Apr 17 '24 22:04 vzhao12

I checked https://github.com/kubeflow/spark-operator/pkgs/container/spark-operator It looks like we didn't publish version v1beta2-1.3.3-3.1.1 at all.

@yuchaoran2011 Can you push this version to fix the issue? Thanks

vzhao12 avatar Apr 17 '24 23:04 vzhao12

Root cause is https://github.com/kubeflow/spark-operator/pull/1937

vzhao12 avatar Apr 17 '24 23:04 vzhao12

/kind bug

bharathk005 avatar Apr 17 '24 23:04 bharathk005

@vzhao12 Until this is addressed, you can use images from the old registry by invoking helm with an extra option

--set 'image.repository=ghcr.io/googlecloudplatform/spark-operator'

zevisert avatar Apr 19 '24 23:04 zevisert

@vzhao12 I am still getting imagepullbackoff error. does anyone have idea? helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set 'image.repository=ghcr.io/googlecloudplatform/spark-operator' I am using this command

JunseoChoJJ avatar Apr 22 '24 08:04 JunseoChoJJ

use 'image.repository=ghcr.io/kubeflow/spark-operator' and 'image.tag=v1beta2-1.4.3-3.5.0'

iva3682 avatar Apr 22 '24 09:04 iva3682

We just released a new image update with important registry fixes. Check it out:

Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0 Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14

Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.

vara-bonthu avatar Apr 26 '24 17:04 vara-bonthu

@vara-bonthu Users will still need to --set=image.repository=... if they are using any tag other than v1beta2-1.4.5-3.5.0 since previous docker images have not yet been replicated to the chart's default repository (docker.io/kubeflow/spark-operator).

Still only one tag exists in the default container registry: https://hub.docker.com/r/kubeflow/spark-operator/tags

Edit: Changed tag to match @RyanZotti's comment

zevisert avatar Apr 26 '24 18:04 zevisert

I think you meant any tag other than v1beta2-1.4.5-3.5.0. The 1.4.3 version isn't available but 1.4.5 is.

RyanZotti avatar Apr 28 '24 21:04 RyanZotti

This issue has been automatically marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. Thank you for your contributions.

github-actions[bot] avatar Jul 24 '24 01:07 github-actions[bot]