kubeflow-manifests icon indicating copy to clipboard operation
kubeflow-manifests copied to clipboard

kubectl 1.25 describe has no `--timeout` flag, breaking kubeflow install

Open PenumbralFromage opened this issue 1 year ago • 2 comments

Describe the bug Installation of kubeflow v1.7.0 (using manifest install pattern, not terraform) breaks when installing the aws-secrets-sync deployment due to a mismatch in the kubectl syntax. Kubectl has no --timeout on a describe operation, which is used in the utils/utils.py line 273, which fails the installation task(s).

Steps To Reproduce

  1. Install all prereq's for cognito-rds-s3
  2. run make deploy-kubeflow INSTALLATION_OPTION=kustomize DEPLOYMENT_OPTION=cognito-rds-s3 PIPELINE_S3_CREDENTIAL_OPTION=irsa
  3. See error (listed below)

Expected behavior Kubeflow installed, and all commands operated without error.

Environment

  • Kubernetes version v1.25.16-eks-8cb36c9

  • Using EKS (yes/no), if so version? Yes - v1.25.16-eks-8cb36c9

  • Kubeflow version v1.7.0

  • kubectl version Client Version: v1.25.0 Kustomize Version: v4.5.7 Server Version: v1.25.16-eks-8cb36c9

  • AWS build number AWS_RELEASE_VERSION="v1.7.0-aws-b1.0.3"

  • AWS service targeted (S3, RDS, etc.) S3, Cognito, RDS, EKS

Screenshots No screens - here's the logs:

...
==========Installing aws-secrets-manager==========
# Warning: 'bases' is deprecated. Please use 'resources' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
deployment.apps/aws-secrets-sync unchanged
Warning: secrets-store.csi.x-k8s.io/v1alpha1 is deprecated. Use secrets-store.csi.x-k8s.io/v1 instead.
secretproviderclass.secrets-store.csi.x-k8s.io/rds-secret unchanged
Waiting for aws-secrets-manager pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (aws-secrets-sync)' --timeout=240s -n kubeflow
error: no matching resources found
error: unknown flag: --timeout
See 'kubectl describe --help' for usage.
Waiting for aws-secrets-manager pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (aws-secrets-sync)' --timeout=240s -n kubeflow
error: no matching resources found
error: unknown flag: --timeout
See 'kubectl describe --help' for usage.
Waiting for aws-secrets-manager pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (aws-secrets-sync)' --timeout=240s -n kubeflow
error: no matching resources found
error: unknown flag: --timeout
See 'kubectl describe --help' for usage.
Traceback (most recent call last):
  File "utils/kubeflow_installation.py", line 324, in <module>
    install_kubeflow(
  File "utils/kubeflow_installation.py", line 101, in install_kubeflow
    install_component(
  File "utils/kubeflow_installation.py", line 180, in install_component
    validate_component_installation(installation_config, component_name)
  File "/usr/local/lib/python3.8/dist-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python3.8/dist-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/usr/local/lib/python3.8/dist-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "utils/kubeflow_installation.py", line 192, in validate_component_installation
    kubectl_wait_pods(value, namespace, key)
  File "/kube/tests/e2e/utils/utils.py", line 275, in kubectl_wait_pods
    raise Exception("Timeout/error waiting for pod condition")
Exception: Timeout/error waiting for pod condition
...

Additional context AWS, EKS, Kubeflow 1.7.0

PenumbralFromage avatar Jan 08 '24 21:01 PenumbralFromage

I can confirm this is the case!

pythonking6 avatar Apr 08 '24 00:04 pythonking6

I have also run into this issue. I've just put in PR #821 to resolve this.

panasenco avatar Jun 12 '24 19:06 panasenco