feature: zalando.org/forward-backend annotation support to enable migration to eks
feature: zalando.org/forward-backend annotation support to enable migration to eks
Usage:
apiVersion: zalando.org/v1
kind: StackSet
metadata:
annotations:
zalando.org/forward-backend: eks migration # the value does not matter
..
This will execute the migration preparation, such that the next traffic switch will send the traffic to the forward backend.
changes should be moved to the GenerateRouteGroup/GenerateDeployment etc. functions to fit into the current model
not sure if this is major or minor, I tend to minor but the number of files of the change feels a bit more than minor.
The current PR crashes with recovered panics at runtime. I think stackset-controller wants to have a deployment.
time="2025-11-17T20:27:34Z" level=info msg="Event(v1.ObjectReference{Kind:\"StackSet\", Namespace:\"default\", Name:\"test-migration-app\", UID:\"11a0c501-1333-4c7a-bb8a-3b58eb68156e\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"2692037662\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedManageStackSet' panic: runtime error: invalid memory address or nil pointer dereference"
time="2025-11-17T20:27:43Z" level=error msg="Encountered a panic while processing a stackset: runtime error: invalid memory address or nil pointer dereference\ngoroutine 153 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackSet.func1()\n\t/workspace/controller/stackset.go:1089 +0x1e5\npanic({0x1c556c0?, 0x32f91b0?})\n\t/usr/local/go/src/runtime/panic.go:783 +0x132\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackDeployment(0xc000000540, {0x2237130, 0xc000109b30}, 0xc000245808, 0x0, 0x0?)\n\t/workspace/controller/stack_resources.go:49 +0x3c1\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackResources(0xc000000540, {0x2237130, 0xc000109b30}, 0xc0004aee80?, 0xc0000c7ba0)\n\t/workspace/controller/stackset.go:1065 +0x265\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackSet(0xc000000540, {0x2237130, 0xc000109b30}, 0xc0004aee80)\n\t/workspace/controller/stackset.go:1155 +0xb29\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).Run.func2()\n\t/workspace/controller/stackset.go:188 +0xa5\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:93 +0x50\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1\n\t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x95\n" controller=stackset namespace=default stackset=test-migration-app
time="2025-11-17T20:27:43Z" level=error msg="unable to reconcile a stackset: panic: runtime error: invalid memory address or nil pointer dereference" controller=stackset namespace=default stackset=test-migration-app
time="2025-11-17T20:27:43Z" level=error msg="Failed waiting for reconcilers: panic: runtime error: invalid memory address or nil pointer dereference" controller=stackset
time="2025-11-17T20:27:43Z" level=info msg="Event(v1.ObjectReference{Kind:\"StackSet\", Namespace:\"default\", Name:\"test-migration-app\", UID:\"11a0c501-1333-4c7a-bb8a-3b58eb68156e\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"2692037662\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedManageStackSet' panic: runtime error: invalid memory address or nil pointer dereference"
This was fixed by the commit after the comment
Deployment Checklist
This change falls under the deployment policy.
💁 Since Nov 10th, we are in the RED deployment zone. This means all changes released to production must adhere to the following requirements:
- [ ] Detailed release notes are provided in this PR’s description.
- [ ] Thorough load-testing has been performed, and is documented in the description/comment.
- [ ] You can enable/disable the change via feature toggles, and have confirmed these toggles work as expected.
- [ ] Technical review: A Principal Engineer, Engineering Manager or Head of Engineering have green-lit your changes, and the reviewer is named in the description/comments.
- [ ] Application Owner (Director+) approval is given about the PR, and the approver is named in the description/comments.
👉 Regardless of which boxes you click in this comment, merge/deployment will not be blocked. Reports about deployment policy adherence will be circulated daily.
I am done with this PR form my side. It was tested in pet cluster with zkubectl traffic.
👍
👍
Can we add an e2e test which shows that when the annotation is set it doesn't create stack resources (like deployment etc.) on the stacks which are created after the annotation is set?
Can we add an e2e test which shows that when the annotation is set it doesn't create stack resources (like deployment etc.) on the stacks which are created after the annotation is set?
we create a deployment, because everything relies on having a deployment. I think e2e test should cover a proper migration call path, but that's not really easily possible with our current infrastructure for e2e tests, so I would say e2e is out of scope, because e2e case should cover the 80% value case and not all edge cases.
:+1:
Can we add an e2e test which shows that when the annotation is set it doesn't create stack resources (like deployment etc.) on the stacks which are created after the annotation is set?
we create a deployment, because everything relies on having a deployment. I think e2e test should cover a proper migration call path, but that's not really easily possible with our current infrastructure for e2e tests, so I would say e2e is out of scope, because e2e case should cover the 80% value case and not all edge cases.
I think it's possible without too much effort, the existing e2e framework provides most of it.
But let's iterate on this in a separate PR, we likely need to iterate a bit on this to make it all work with CDP etc.
:+1: