argo-rollouts
argo-rollouts copied to clipboard
experiments: recovered from panic, on v1.4
When running an experiment, a number of nil pointer exceptions are thrown - before the experiment eventually passes and moseys on its merry way.
We use istio in our environment.
It seems to resolve itself after a dozen+ failures.
To Reproduce
spec:
analysis:
successfulRunHistoryLimit: 1
unsuccessfulRunHistoryLimit: 1
progressDeadlineAbort: true
replicas: 2
revisionHistoryLimit: 5
strategy:
canary:
steps:
- experiment:
analyses:
- args:
- name: api-root-url
value: >-
http://example-preview.default.svc.cluster.local:8080
- name: request-timeout-milliseconds
value: '1000'
name: example-smoke-test-analysis
requiredForCompletion: true
templateName: example-smoke-test-analysis
duration: 5m
templates:
- metadata:
labels:
app.kubernetes.io/name: example-preview
name: preview
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: example-preview
specRef: canary
- setWeight: 5
- pause:
duration: 10m
trafficRouting:
istio:
destinationRule:
canarySubsetName: canary
name: example-mesh-destination
stableSubsetName: stable
virtualService:
name: example-mesh-virtualservice
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: example-svc-deployment
Expected behavior
The logs to not raise errors of nil pointer exceptions
Version
1.4
Logs
Recovered from panic: runtime error: invalid memory address or nil pointer dereference
goroutine 325 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1()
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:149 +0x58
panic({0x21c1260, 0x3c0ff10})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).calculateWeightDestinationsFromExperiment(0xc00198ac00)
/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting.go:333 +0x1f7
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcileTrafficRouting(0xc00198ac00)
/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting.go:176 +0x7c5
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).rolloutCanary(0xc00198ac00)
/go/src/github.com/argoproj/argo-rollouts/rollout/canary.go:56 +0x166
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcile(0xc00198ac00)
/go/src/github.com/argoproj/argo-rollouts/rollout/context.go:86 +0xe7
github.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc0000e48c0, {0x29d81b0, 0xc000806c00}, {0xc0023f39b0, 0x27})
/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:415 +0x4d3
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1()
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:153 +0x89
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1({0x29e4220?, 0xc000522520}, {0x25881fc, 0x7}, 0xc0009a7e70, {0x29d81b0, 0xc000806c00}, 0xc0005812c0?, {0x2093520, 0xc001bdde70})
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:157 +0x40b
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem({0x29d81b0, 0xc000806c00}, {0x29e4220, 0xc000522520}, {0x25881fc, 0x7}, 0x0?, 0x0?)
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:171 +0xbf
github.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:104
github.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()
/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:336 +0xbe
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x29b80e0, 0xc00150ccf0}, 0x1, 0xc00017d0e0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x25
created by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run
/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:335 +0xa7
@Lykathia Can you try v1.4.1
@zachaller yep, PR in - will get it reviewed and tested on Monday and report back. Thanks!
1.4.1 doesn't not appear to have addressed the issue.
Recovered from panic: runtime error: invalid memory address or nil pointer dereference
goroutine 337 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1()
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:149 +0x58
panic({0x21c24a0, 0x3c11f30})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).calculateWeightDestinationsFromExperiment(0xc0019b8c00)
/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting.go:333 +0x1f7
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcileTrafficRouting(0xc0019b8c00)
/go/src/github.com/argoproj/argo-rollouts/rollout/trafficrouting.go:176 +0x7c5
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).rolloutCanary(0xc0019b8c00)
/go/src/github.com/argoproj/argo-rollouts/rollout/canary.go:56 +0x166
github.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcile(0xc0019b8c00)
/go/src/github.com/argoproj/argo-rollouts/rollout/context.go:86 +0xe7
github.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc000210700, {0x29d99f0, 0xc0002ea100}, {0xc002159dd0, 0x27})
/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:415 +0x4d3
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1()
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:153 +0x89
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1({0x29e5a60?, 0xc000512c20}, {0x25899bc, 0x7}, 0xc000e4de70, {0x29d99f0, 0xc0002ea100}, 0xc000317080?, {0x2094760, 0xc002dc11d0})
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:157 +0x40b
github.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem({0x29d99f0, 0xc0002ea100}, {0x29e5a60, 0xc000512c20}, {0x25899bc, 0x7}, 0x2094760?, 0x29b0250?)
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:171 +0xbf
github.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)
/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:104
github.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()
/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:336 +0xbe
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x3332303622?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x3a22736c6562616c?, {0x29b9900, 0xc000d63590}, 0x1, 0xc00055d740)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x706d65742d646f70?, 0x3b9aca00, 0x0, 0x37?, 0x61746f6e6e61222c?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x6a6f72706f677261?, 0x636e79732f6f692e?, 0x223a22657661772d?)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x25
created by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run
/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:335 +0xa7
@Lykathia I would love to see your rollout object with the status field intact if you could share that it would be awesome. I also have a question if you are trying to use a weight on your experiment which I should be able to gather from the sharing of your ro object.
Sure thing! Some nouns edited and some annotations / labels removed
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
annotations:
argocd.argoproj.io/sync-wave: '10'
rollout.argoproj.io/revision: '46'
rollout.argoproj.io/workload-generation: '52'
creationTimestamp: '2022-06-03T17:50:51Z'
generation: 39
labels:
app.kubernetes.io/instance: production-eks-prod-examplens-example
name: example-rollout-c89ac2e8
namespace: examplens
resourceVersion: '614921626'
uid: 86e21dfc-7adf-40f5-a9de-b5d368b0a843
spec:
analysis:
successfulRunHistoryLimit: 1
unsuccessfulRunHistoryLimit: 1
progressDeadlineAbort: true
replicas: 2
restartAt: '2022-11-09T19:40:00Z'
revisionHistoryLimit: 5
strategy:
canary:
steps:
- experiment:
analyses:
- args:
- name: api-root-url
value: >-
http://example-rollout-preview.examplens.svc.cluster.local:8080
- name: request-timeout-milliseconds
value: '1000'
name: example-smoke-test-analysis
requiredForCompletion: true
templateName: example-smoke-test-analysis
duration: 5m
templates:
- metadata:
labels:
app.kubernetes.io/name: example-rollout-preview
name: preview
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: example-rollout-preview
specRef: canary
- setWeight: 5
- pause:
duration: 10m
trafficRouting:
istio:
destinationRule:
canarySubsetName: canary
name: example-mesh-destination-c833f030
stableSubsetName: stable
virtualService:
name: example-mesh-virtualservice-c8e2aa97
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: example-svc-deployment-c8e30830
status:
HPAReplicas: 2
availableReplicas: 2
blueGreen: {}
canary:
weights:
canary:
podTemplateHash: 5ccdfd574c
weight: 0
stable:
podTemplateHash: 5ccdfd574c
weight: 100
conditions:
- lastTransitionTime: '2023-02-27T14:45:05Z'
lastUpdateTime: '2023-02-27T14:45:05Z'
message: Rollout has minimum availability
reason: AvailableReason
status: 'True'
type: Available
- lastTransitionTime: '2023-02-27T15:12:21Z'
lastUpdateTime: '2023-02-27T15:12:21Z'
message: Rollout is paused
reason: RolloutPaused
status: 'False'
type: Paused
- lastTransitionTime: '2023-02-27T15:12:32Z'
lastUpdateTime: '2023-02-27T15:12:32Z'
message: RolloutCompleted
reason: RolloutCompleted
status: 'True'
type: Completed
- lastTransitionTime: '2023-02-27T15:13:02Z'
lastUpdateTime: '2023-02-27T15:13:02Z'
message: Rollout is healthy
reason: RolloutHealthy
status: 'True'
type: Healthy
- lastTransitionTime: '2023-02-27T15:12:21Z'
lastUpdateTime: '2023-02-27T15:13:02Z'
message: >-
ReplicaSet "example-rollout-c89ac2e8-5ccdfd574c" has successfully
progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
currentPodHash: 5ccdfd574c
currentStepHash: 75dcbd8f68
currentStepIndex: 3
observedGeneration: '39'
phase: Healthy
readyReplicas: 2
replicas: 2
restartedAt: '2022-11-09T19:40:00Z'
selector: app.kubernetes.io/name=example-svc-c8333369
stableRS: 5ccdfd574c
updatedReplicas: 2
workloadObservedGeneration: '52'
everything seems to complete fine and as expected, just the logs are exploding w/ the NPEs.
This issue is stale because it has awaiting-response label for 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue is stale because it has been open 60 days with no activity.
Maybe related https://github.com/argoproj/argo-rollouts/pull/2734
This issue is stale because it has been open 60 days with no activity.
I saw similar exceptions occurring when calculateWeightDestinationsFromExperiment was called. My guess was that Argo was calling this function to determine the weight the experiment should have and it was returning NULL
or an unexpected object or something. Then Argo would try and adjust the Virtual Service weight to this value, and Istio's admission controller would reject the non-zero/positive integer value and Argo would fail to catch the exception gracefully.
I solved it by adding a weight: #
to my experiment.