skaffold Statefulsets need a moment to stabilize / not abort skaffold dev during launch

Statefulsets need a moment to stabilize / not abort skaffold dev during launch

Open DGollings opened this issue 3 years ago • 5 comments

Expected behavior

Skaffold Dev starts, whilst allowing a little failure

Actual behavior

Skaffold sees an error and terminates everything

Information

Skaffold version: 1.35

Steps to reproduce the behavior

Have a statefulset (I can paste the contents of nats/stan if requested)
skaffold dev

statefulset/stan: container stan is backing off waiting to restart
- pod/stan-0: container stan is backing off waiting to restart
  
  [stan-0 stan] [1] 2021/11/26 09:11:31.878330 [INF] STREAM: Starting nats-streaming-server[stan] version 0.16.2 [stan-0 stan] [1] 2021/11/26 09:11:31.878355 [INF] STREAM: ServerID: DDROjri7DXdxYBnHLqvfWF [stan-0 stan] [1] 2021/11/26 09:11:31.878356 [INF] STREAM: Go version: go1.11.13 [stan-0 stan] [1] 2021/11/26 09:11:31.878357 [INF] STREAM: Git commit: [910d6e1] [stan-0 stan] [1] 2021/11/26 09:11:31.881090 [INF] STREAM: Shutting down. [stan-0 stan] [1] 2021/11/26 09:11:31.881121 [FTL] STREAM: Failed to start: nats: no servers available for connection
statefulset/stan failed. Error: container stan is backing off waiting to restart.

As stan depends on nats this is very normal behaviour, simply die and try again, and thus 'impossible' to fix

Downgrading to < 1.35 instantly fixes the issue

Related to #4158, #6205 and in particular #6828

Nov 26 '21 12:11 DGollings

@gsquared94 can you add any information here regarding if this is intended behaviour from https://github.com/GoogleContainerTools/skaffold/pull/6828 and what possible short-term/long-term fixes there might for this issue?

Nov 29 '21 17:11 aaron-prindle

@DGollings thanks for the issue. We added Statefulsets status check recently. Looks like the ask is to ignore this failure.
Does skaffold dev exit on the first occurrence of this failure? If not, have you tries using the statusCheckDeadlineSeconds config field and bump the value ?

Jan 10 '22 19:01 tejal29

Looks like the ask is to ignore this failure. Does skaffold dev exit on the first occurrence of this failure?

yes

If not, have you tries using the statusCheckDeadlineSeconds config field and bump the value ?

was already 600 secs, but instantly dies

Jan 17 '22 14:01 DGollings

Assigning this to @aaron-prindle. They are looking into it.

May 09 '22 18:05 tejal29

We made a fix for auto-pilot cluster which got released in v2.0.0-beta2. Note: not available in cloud code.

Sep 02 '22 16:09 tejal29

the issue can be fixed by adding --tolerate-failures-until-deadline flag when running skaffold dev , implementation #8047

Nov 09 '22 19:11 ericzzzzzzz

skaffold skaffold copied to clipboard

Statefulsets need a moment to stabilize / not abort skaffold dev during launch

Expected behavior

Actual behavior

Information

Steps to reproduce the behavior

skaffold
skaffold copied to clipboard