skaffold icon indicating copy to clipboard operation
skaffold copied to clipboard

Skaffold waits indefinitely for pod to stabilize although it is already running

Open DerGary opened this issue 2 years ago • 11 comments

Expected behavior

Skaffold stabilize works when a Pod is deployed

Actual behavior

Skaffold waits endlessly for stabilizing until the timeout of 10 minutes is reached and the deployments is canceled

Information

  • Skaffold version: 1.39.1
  • Operating system: MacOs 12.5.1
  • Installed via: Homebrew
  • Contents of skaffold.yaml:
apiVersion: skaffold/v2beta24
kind: Config

deploy:
  kustomize:
    paths:
    - .
  kubeContext: docker-desktop
  • Contents of kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- deployment.yaml
  • Contents of deployment.yaml:
apiVersion: v1
kind: Pod
metadata:
  name: toolbox
spec:
  containers:
  - name: toolbox
    image: alpine:latest
    imagePullPolicy: IfNotPresent
    command: [ "tail", "-f", "/dev/null" ] # let the container running in the background without doing something
  terminationGracePeriodSeconds: 0 # terminate instantly, as tail does not exit, it would otherwhise wait 30 seconds for termination

Steps to reproduce the behavior

  1. clone https://github.com/DerGary/skaffold-podbug-example
  2. run skaffold dev

Logs

  • Skaffold without debug:
Listing files to watch...
Generating tags...
Checking cache...
Tags used in deployment:
Starting deploy...
 - pod/toolbox created
Waiting for deployments to stabilize...
 - pods: could not stabilize within 10m0s
 - pods failed. Error: could not stabilize within 10m0s.
Cleaning up...
 - pod "toolbox" deleted
1/1 deployment(s) failed
  • kubectl describe pod toolbox:
Name:         toolbox
Namespace:    default
Priority:     0
Node:         docker-desktop/192.168.65.4
Start Time:   Wed, 31 Aug 2022 09:53:37 +0200
Labels:       skaffold.dev/run-id=c55e0baf-ebb7-4b2c-aa6e-ddb580c0207f
Annotations:  <none>
Status:       Running
IP:           10.1.16.235
IPs:
  IP:  10.1.16.235
Containers:
  toolbox:
    Container ID:  docker://2d66b6508d6406f449c3e27d7c1b058100ab1e7bc15ccd993355a90bd59b875f
    Image:         alpine:latest
    Image ID:      docker-pullable://alpine@sha256:bc41182d7ef5ffc53a40b044e725193bc10142a1243f395ee852a8d9730fc2ad
    Port:          <none>
    Host Port:     <none>
    Command:
      tail
      -f
      /dev/null
    State:          Running
      Started:      Wed, 31 Aug 2022 09:53:38 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fbscl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-fbscl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  38s   default-scheduler  Successfully assigned default/toolbox to docker-desktop
  Normal  Pulled     38s   kubelet            Container image "alpine:latest" already present on machine
  Normal  Created    38s   kubelet            Created container toolbox
  Normal  Started    38s   kubelet            Started container toolbox
  • skaffold dev -vdebug
DEBU[0000] skaffold API not starting as it's not requested  subtask=-1 task=DevLoop
INFO[0000] Skaffold &{Version:v1.39.1 ConfigVersion:skaffold/v2beta29 GitVersion: GitCommit:cd3f6fa3231ae8abf7f028eb7163d74aafd6ae94 BuildDate:2022-06-25T00:11:50Z GoVersion:go1.17.11 Compiler:gc Platform:darwin/amd64 User:}  subtask=-1 task=DevLoop
INFO[0000] Loaded Skaffold defaults from "/Users/macdev/.skaffold/config"  subtask=-1 task=DevLoop
DEBU[0000] config version out of date: upgrading to latest "skaffold/v2beta29"  subtask=-1 task=DevLoop
DEBU[0000] parsed 1 configs from configuration file /Users/macdev/Documents/git/skaffold-bug-example/skaffold.yaml  subtask=-1 task=DevLoop
DEBU[0000] Defaulting build type to local build          subtask=-1 task=DevLoop
INFO[0000] Using kubectl context: docker-desktop         subtask=-1 task=DevLoop
DEBU[0000] getting client config for kubeContext: `docker-desktop`  subtask=-1 task=DevLoop
DEBU[0000] Running command: [minikube version --output=json]  subtask=-1 task=DevLoop
DEBU[0000] setting Docker user agent to skaffold-v1.39.1  subtask=-1 task=DevLoop
DEBU[0000] CLI platforms provided: ""                    subtask=-1 task=DevLoop
DEBU[0000] getting client config for kubeContext: `docker-desktop`  subtask=-1 task=DevLoop
DEBU[0000] platforms detected from active kubernetes cluster nodes: "linux/amd64"  subtask=-1 task=DevLoop
DEBU[0000] Using builder: local                          subtask=-1 task=DevLoop
DEBU[0000] push value not present in NewBuilder, defaulting to false because cluster.PushImages is false  subtask=-1 task=DevLoop
INFO[0000] build concurrency first set to 1 parsed from *local.Builder[0]  subtask=-1 task=DevLoop
INFO[0000] final build concurrency value is 1            subtask=-1 task=DevLoop
Listing files to watch...
DEBU[0000] Executing template &{envTemplate 0xc0000fb8c0 0xc000d8caa0  } with environment map[COLORTERM:truecolor COMMAND_MODE:unix2003 GIT_ASKPASS:/Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass.sh HOME:/Users/macdev LANG:en_GB.UTF-8 LESS:-R LOGNAME:macdev LSCOLORS:Gxfxcxdxbxegedabagacad MallocNanoZone:0 OLDPWD:/Users/macdev/Documents/git/skaffold-bug-example ORIGINAL_XDG_CURRENT_DESKTOP:undefined PAGER:less PATH:/Users/macdev/.rd/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/share/dotnet:~/.dotnet/tools:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Users/macdev/.rd/bin PWD:/Users/macdev/Documents/git/skaffold-bug-example SHELL:/bin/zsh SHLVL:1 SSH_AUTH_SOCK:/private/tmp/com.apple.launchd.z0QW2TB2yK/Listeners TERM:xterm-256color TERM_PROGRAM:vscode TERM_PROGRAM_VERSION:1.70.2 TMPDIR:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/ USER:macdev VSCODE_GIT_ASKPASS_EXTRA_ARGS:--ms-enable-electron-run-as-node VSCODE_GIT_ASKPASS_MAIN:/Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass-main.js VSCODE_GIT_ASKPASS_NODE:/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper VSCODE_GIT_IPC_HANDLE:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/vscode-git-2bf68283ae.sock VSCODE_INJECTION:1 XPC_FLAGS:0x0 XPC_SERVICE_NAME:0 ZDOTDIR:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/vscode-zsh ZSH:/Users/macdev/.oh-my-zsh _:/usr/local/bin/skaffold __CFBundleIdentifier:com.microsoft.VSCode __CF_USER_TEXT_ENCODING:0x1F6:0x0:0x0]  subtask=-1 task=DevLoop
INFO[0000] List generated in 1.110424ms                  subtask=-1 task=DevLoop
Generating tags...
INFO[0000] Tags generated in 34.875µs                    subtask=-1 task=Build
Checking cache...
INFO[0000] Cache check completed in 168.349µs            subtask=-1 task=Build
Tags used in deployment:
Starting deploy...
DEBU[0000] getting client config for kubeContext: `docker-desktop`  subtask=-1 task=DevLoop
DEBU[0000] Running command: [kubectl version --client -ojson]  subtask=0 task=Deploy
DEBU[0000] Command output: [{
  "clientVersion": {
    "major": "1",
    "minor": "21",
    "gitVersion": "v1.21.3",
    "gitCommit": "ca643a4d1f7bfe34773c74f79527be4afd95bf39",
    "gitTreeState": "clean",
    "buildDate": "2021-07-15T21:04:39Z",
    "goVersion": "go1.16.6",
    "compiler": "gc",
    "platform": "darwin/amd64"
  }
}
]  subtask=0 task=Deploy
DEBU[0000] Executing template &{envTemplate 0xc000a399e0 0xc00092f220  } with environment map[COLORTERM:truecolor COMMAND_MODE:unix2003 GIT_ASKPASS:/Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass.sh HOME:/Users/macdev LANG:en_GB.UTF-8 LESS:-R LOGNAME:macdev LSCOLORS:Gxfxcxdxbxegedabagacad MallocNanoZone:0 OLDPWD:/Users/macdev/Documents/git/skaffold-bug-example ORIGINAL_XDG_CURRENT_DESKTOP:undefined PAGER:less PATH:/Users/macdev/.rd/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/share/dotnet:~/.dotnet/tools:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Users/macdev/.rd/bin PWD:/Users/macdev/Documents/git/skaffold-bug-example SHELL:/bin/zsh SHLVL:1 SSH_AUTH_SOCK:/private/tmp/com.apple.launchd.z0QW2TB2yK/Listeners TERM:xterm-256color TERM_PROGRAM:vscode TERM_PROGRAM_VERSION:1.70.2 TMPDIR:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/ USER:macdev VSCODE_GIT_ASKPASS_EXTRA_ARGS:--ms-enable-electron-run-as-node VSCODE_GIT_ASKPASS_MAIN:/Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass-main.js VSCODE_GIT_ASKPASS_NODE:/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper VSCODE_GIT_IPC_HANDLE:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/vscode-git-2bf68283ae.sock VSCODE_INJECTION:1 XPC_FLAGS:0x0 XPC_SERVICE_NAME:0 ZDOTDIR:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/vscode-zsh ZSH:/Users/macdev/.oh-my-zsh _:/usr/local/bin/skaffold __CFBundleIdentifier:com.microsoft.VSCode __CF_USER_TEXT_ENCODING:0x1F6:0x0:0x0]  subtask=-1 task=DevLoop
DEBU[0000] Running command: [kustomize build .]          subtask=0 task=Deploy
DEBU[0000] Command output: [apiVersion: v1
kind: Pod
metadata:
  name: toolbox
spec:
  containers:
  - command:
    - tail
    - -f
    - /dev/null
    image: alpine:latest
    imagePullPolicy: IfNotPresent
    name: toolbox
  terminationGracePeriodSeconds: 0
]  subtask=0 task=Deploy
DEBU[0000] manifests with tagged images:apiVersion: v1
kind: Pod
metadata:
  name: toolbox
spec:
  containers:
  - command:
    - tail
    - -f
    - /dev/null
    image: alpine:latest
    imagePullPolicy: IfNotPresent
    name: toolbox
  terminationGracePeriodSeconds: 0  subtask=0 task=Deploy
DEBU[0000] manifests with labels apiVersion: v1
kind: Pod
metadata:
  labels:
    skaffold.dev/run-id: c6eec86e-e726-4183-af12-8d4d4e23f536
  name: toolbox
spec:
  containers:
  - command:
    - tail
    - -f
    - /dev/null
    image: alpine:latest
    imagePullPolicy: IfNotPresent
    name: toolbox
  terminationGracePeriodSeconds: 0  subtask=-1 task=DevLoop
DEBU[0000] Running command: [kubectl --context docker-desktop get -f - --ignore-not-found -ojson]  subtask=0 task=Deploy
DEBU[0000] Command output: []                            subtask=0 task=Deploy
DEBU[0000] 1 manifests to deploy. 1 are updated or new   subtask=0 task=Deploy
DEBU[0000] Running command: [kubectl --context docker-desktop apply -f -]  subtask=0 task=Deploy
 - pod/toolbox created
INFO[0000] Deploy completed in 454.900172ms              subtask=-1 task=Deploy
Waiting for deployments to stabilize...
DEBU[0000] getting client config for kubeContext: `docker-desktop`  subtask=-1 task=DevLoop
DEBU[0000] getting client config for kubeContext: `docker-desktop`  subtask=-1 task=DevLoop
DEBU[0000] checking status pods                          subtask=-1 task=Deploy
DEBU[0601] marking resource failed due to error code STATUSCHECK_DEADLINE_EXCEEDED  subtask=-1 task=Deploy
 - pods: could not stabilize within 10m0s
 - pods failed. Error: could not stabilize within 10m0s.
DEBU[0601] setting skaffold deploy status to STATUSCHECK_DEADLINE_EXCEEDED.  subtask=-1 task=Deploy
Cleaning up...
DEBU[0601] Executing template &{envTemplate 0xc000bc70e0 0xc000d8f630  } with environment map[COLORTERM:truecolor COMMAND_MODE:unix2003 GIT_ASKPASS:/Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass.sh HOME:/Users/macdev LANG:en_GB.UTF-8 LESS:-R LOGNAME:macdev LSCOLORS:Gxfxcxdxbxegedabagacad MallocNanoZone:0 OLDPWD:/Users/macdev/Documents/git/skaffold-bug-example ORIGINAL_XDG_CURRENT_DESKTOP:undefined PAGER:less PATH:/Users/macdev/.rd/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/share/dotnet:~/.dotnet/tools:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Users/macdev/.rd/bin PWD:/Users/macdev/Documents/git/skaffold-bug-example SHELL:/bin/zsh SHLVL:1 SSH_AUTH_SOCK:/private/tmp/com.apple.launchd.z0QW2TB2yK/Listeners TERM:xterm-256color TERM_PROGRAM:vscode TERM_PROGRAM_VERSION:1.70.2 TMPDIR:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/ USER:macdev VSCODE_GIT_ASKPASS_EXTRA_ARGS:--ms-enable-electron-run-as-node VSCODE_GIT_ASKPASS_MAIN:/Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass-main.js VSCODE_GIT_ASKPASS_NODE:/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper VSCODE_GIT_IPC_HANDLE:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/vscode-git-2bf68283ae.sock VSCODE_INJECTION:1 XPC_FLAGS:0x0 XPC_SERVICE_NAME:0 ZDOTDIR:/var/folders/gq/26phy7rj5kv88dchtpn9c_5w0000gp/T/vscode-zsh ZSH:/Users/macdev/.oh-my-zsh _:/usr/local/bin/skaffold __CFBundleIdentifier:com.microsoft.VSCode __CF_USER_TEXT_ENCODING:0x1F6:0x0:0x0]  subtask=-1 task=DevLoop
DEBU[0601] Running command: [kustomize build .]          subtask=-1 task=DevLoop
DEBU[0601] Command output: [apiVersion: v1
kind: Pod
metadata:
  name: toolbox
spec:
  containers:
  - command:
    - tail
    - -f
    - /dev/null
    image: alpine:latest
    imagePullPolicy: IfNotPresent
    name: toolbox
  terminationGracePeriodSeconds: 0
]  subtask=-1 task=DevLoop
DEBU[0601] Running command: [kubectl --context docker-desktop delete --ignore-not-found=true --wait=false -f -]  subtask=-1 task=DevLoop
 - pod "toolbox" deleted
INFO[0603] Cleanup completed in 2.164 seconds            subtask=-1 task=DevLoop
DEBU[0603] Running command: [tput colors]                subtask=-1 task=DevLoop
DEBU[0603] Command output: [256
]                        subtask=-1 task=DevLoop
1/1 deployment(s) failed
DEBU[0603] exporting metrics                             subtask=-1 task=DevLoop

DerGary avatar Aug 31 '22 08:08 DerGary

I deleted kubeContext: docker-desktop from skaffold.yaml as I don't have that k8s context. I cannot reproduce the issue after that.

ericzzzzzzz avatar Aug 31 '22 15:08 ericzzzzzzz

I can reproduce it with the local docker-desktop kubernetes and with the rancher-desktop kubernetes. @ericzzzzzzz which kubernetes cluster are you using? Actually when I try a hosted context in Azure Kubernetes Service (AKS) then it works as expected.

DerGary avatar Sep 01 '22 07:09 DerGary

@DerGary my bad.. I was assuming that you're using minikube as kubernetes cluster for local development. Using minikube works for your test project on my machine, however I'm able to reproduce the issue with docker-desktop kubernetes cluster on machine now with 1.39.1. I'll look into that why this is happening.
Example/getting-started project works with docker-desktop cluster, the test project also works with this cluster if using binary built from main branch, this scenario seems a special case so mark this issue as p2 temporarily.

ericzzzzzzz avatar Sep 01 '22 15:09 ericzzzzzzz

It's interesting that Example/getting-started project works. I looked deeper into that and I found the difference which makes it work / not work. With my example I can also get it to work when I change kustomize to kubectl in the skaffold.yaml like this:

skaffold.yaml

apiVersion: skaffold/v2beta24
kind: Config

deploy:
  kustomize:
    paths:
    - .

to:

apiVersion: skaffold/v2beta24
kind: Config

deploy:
  kubectl:
    manifests:
      - deployment.yaml

But that is not really a solution for us at the moment. But maybe it helps finding the root cause?

DerGary avatar Sep 02 '22 06:09 DerGary

I think I'm experiencing this same bug. I have two Skaffold projects, one that uses deploy.kubectl, and one that uses deploy.kustomize. I'm using Rancher Desktop and also skaffold dev. The project using deploy.kubectl deploys and stabilizes fine, and the project using deploy.kustomize fails to stabilize, erroring with "[image] can't be pulled".

My hypothesis is that kustomize deployer doesn't correctly inject the local docker image (docker://...). I may have more time to investigate later.

kevin-hanselman avatar Sep 10 '22 15:09 kevin-hanselman

@DerGary @kevin-hanselman

Kustomize deployer reads k8s config to get default namespace to build status check monitor, can you try run kubectl config set-context --current --namespace=default to set current namespace to default, then run your test case to see if it works?

ericzzzzzzz avatar Sep 12 '22 15:09 ericzzzzzzz

@ericzzzzzzz I tried this, and it doesn't help. I don't think the issue is with the status monitor's namespace. I think k8s is trying to pull the image(s) from a remote registry when it shouldn't be; the image should be taken from the local Docker instance.

In the kubectl deploy case (i.e. the working case), in the server-side Pod YAML, under status.containerStatuses, imageID is set to docker://....

In the kustomize deploy case (i.e. the broken case), in the server-side Pod YAML, under status.containerStatuses, imageID is empty.

To reiterate and clarify: I am running Rancher Desktop, and I have configured Skaffold to recognize it as a local cluster:

$ skaffold config list
skaffold config:
kube-context: rancher-desktop
local-cluster: true

kevin-hanselman avatar Sep 12 '22 16:09 kevin-hanselman

Hey @kevin-hanselman, the issue you're experiencing is different from @DerGary's, In his case:

  • The container is already running, it's just Skaffold cannot get the status from his cluster.

your case:

  • image cannot be pulled from registry, if deploying something to rancher-desktop cluster with a kustomize deployer

If this is true, please open another issue, it would be great if you can provide more details for us to reproduce the issue. Thanks .

ericzzzzzzz avatar Sep 12 '22 17:09 ericzzzzzzz

@ericzzzzzzz Thanks for clarifying. I began working on a minimal reproducible example, and I found the source of the issue I'm experiencing. I had my Pod's container configured with imagePullPolicy: Always, so k8s will always try to pull it from a registry. This is my mistake. Sorry for adding noise to this issue.

kevin-hanselman avatar Sep 12 '22 19:09 kevin-hanselman

@ericzzzzzzz

kubectl config set-context --current --namespace=default

I don't really grasp what this does because my namespace was default all along but after executing this, I can't reproduce the issue anymore. (I also upgraded skaffold to 1.39.2 since then, which shouldn't make a difference I think?)

DerGary avatar Sep 13 '22 06:09 DerGary

@DerGary

You can reproduce the issue again by resetting your docker-desktop cluster(I wouldn't recommend to do this though) or run kubectl config set-context --current --namespace='' to set your current namespace preference to an empty string. Then validate the namespace setup in your context config, by kubectl config view --minify |grep namespace: no result should be found this time, then you can run your test project, the issue is still there even if you're with skaffold 1.39.2.

After testing this, you can reset the namespace to default kubectl config set-context --current --namespace=default, run kubectl config view --minify |grep namespace: then namespace: default should be in the output. Now run skaffold dev everything should be good.

The problem is that docker-desktop doesn't have namespace preference in config, and kustomize deployer will use empty string as namespace when doing status check for pods, however your apps are instead deployed to default namespace, so status monitor keeps getting nothing from docker-desktop cluster and re-trying, hence getting timeout. And that's why manually set namespace can fix the issue.

Skaffold doesn't have this problem with main branch, the main branch is actually using kubectl as deployer to replace kustomize deployer when doing schema upgrading.

ericzzzzzzz avatar Sep 13 '22 15:09 ericzzzzzzz

I found a similar behavior when I manually waited a rollout with a following code.

skaffold run --status-code=false
kubectl rollout status deployment/<app>

However, a following code doesn't have a problem.

# Deploy but don't wait for rollout first.
skaffold run --status-code=false
# Then deploy again.
skaffold run

On second run, I expected deployment.apps/<app> unchanged, but it prints deployment.apps/<app> configured

foriequal0 avatar Oct 14 '22 05:10 foriequal0