Updating Ports with correctDrift enabled using multiple-paths repo triggers an error
Issue
Error triggered after updating ports with correctDrift enabled while using multiple-paths repo
Reproduction steps
- Install Rancher 2.8-head with 3 downstream clusters
- Created a GitRepo by enabling correctDrift. (https://github.com/rancher/fleet-test-data, path: multiple-paths) and deployed to all downstream clusers
- Navigated to DS cluster --> Services
- Edited service by updating it's port.
- Waited for 30 seconds.
- Observations on Rancher 2.8.5:
- GitRepo is in Modified state with Error(See screenshot.)
- Navigated to Continuous Delivery --> Clusters.
- DS cluster on which service was deployed/created is in Modified state.
- Above observation is seen in Rancher
2.8.5+ ~~2.8-headbut not in2.9.0-alpha7~~. Edit: Observed in 2.9 as well IFForce updateis applied
This check has been currently added to ui/e2e ci here. A video can be downloaded from artifact and can by playing the part p1.specs.ts on minute 8:02
Not sure if the Modified state is somehow related to https://github.com/rancher/dashboard/issues/11404
FTR: After adding latest checks for this issue on our ci, 2.8-head did not show this error today, but 2.7-head did.
I just performed a manual check and it seems there was a caveat in our automation.
We removed the click on force update from our tests. For some reason in 2.7 this was still ok and spotted the issue, but in 2.9 it would remain unnoticed until clicked. After doing this the issue appeared:
I will keep investigating tomorrow morning
Seems related to how helm upgrades services for us. See https://github.com/kubernetes/kubernetes/issues/105610
fleet-agent should store the error in the bundledeployments status
Adding error logs
{
"level": "error",
"ts": "2024-07-10T09:02:36Z",
"logger": "bundledeployment",
"msg": "Failed to deploy bundle",
"controller": "bundledeployment",
"controllerGroup": "fleet.cattle.io",
"controllerKind": "BundleDeployment",
"BundleDeployment": {
"name": "test-drift-multiple-paths-service",
"namespace": "cluster-fleet-default-imported-2-945038eba7ea"
},
"namespace": "cluster-fleet-default-imported-2-945038eba7ea",
"name": "test-drift-multiple-paths-service",
"reconcileID": "308f9b4f-fb0f-48ca-a273-2e475b183966",
"status": {
"conditions": [
{
"type": "Installed",
"status": "True",
"lastUpdateTime": "2024-07-10T09:01:20Z"
},
{
"type": "Deployed",
"status": "False",
"lastUpdateTime": "2024-07-10T09:02:36Z",
"reason": "Error",
"message": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\""
},
{
"type": "Ready",
"status": "True",
"lastUpdateTime": "2024-07-10T09:02:36Z"
},
{
"type": "Monitored",
"status": "True",
"lastUpdateTime": "2024-07-10T09:01:20Z"
}
],
"appliedDeploymentID": "s-e900fb60b86d8593e95a733a0c0d1794f2d71a00910f794d19bcd4d57deca:aa73273923fd2b194b95dc51be330a7b1be92dafa689e0afb400abda8b37d8c0",
"release": "test-fleet-mp-service/test-drift-multiple-paths-service:1",
"ready": true,
"nonModified": true,
"display": {
"deployed": "Error: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
"monitored": "True",
"state": "Ready"
},
"syncGeneration": 0
},
"error": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
"errorVerbose": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"\nhelm.sh/helm/v3/pkg/kube.(*Client).Update\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/kube/client.go:438\nhelm.sh/helm/v3/pkg/action.(*Install).performInstall\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/action/install.go:456\nhelm.sh/helm/v3/pkg/action.(*Install).performInstallCtx.func1\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/action/install.go:421\nruntime.goexit\n\t/home/runner/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695",
"stacktrace": "github.com/rancher/fleet/internal/cmd/agent/controller.(*BundleDeploymentReconciler).Reconcile\n\t/home/runner/work/fleet/fleet/internal/cmd/agent/controller/bundledeployment_controller.go:129\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}
{
"level": "error",
"ts": "2024-07-10T09:02:36Z",
"msg": "Reconciler error",
"controller": "bundledeployment",
"controllerGroup": "fleet.cattle.io",
"controllerKind": "BundleDeployment",
"BundleDeployment": {
"name": "test-drift-multiple-paths-service",
"namespace": "cluster-fleet-default-imported-2-945038eba7ea"
},
"namespace": "cluster-fleet-default-imported-2-945038eba7ea",
"name": "test-drift-multiple-paths-service",
"reconcileID": "308f9b4f-fb0f-48ca-a273-2e475b183966",
"error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
"errorCauses": [
{
"error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\""
}
],
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}
Additional QA
Problem
When failing to correct drift on a resource (eg. modified ports array on a service), Fleet would leave a GitRepo in Modified state, with no error on the corresponding bundle deployment status.
Solution
- The drift correction error is now reflected in the bundle deployment status, which should in turn propagate it to the
GitRepostatus - Setting
force: trueon theGitReporesolves the error by deleting and recreating the Helm release for the bundle deployment, hence recreating the service in this case.
Testing
See reproduction steps above, in the issue description.
Engineering Testing
Manual Testing
- Created a
GitRepowith drift correction enabled (but not set to force mode) pointing torancher/fleet-test-data'smultiple-paths - Edited the created service ports
- Checked status of the
GitRepoand bundle deployment - Updated the
GitRepodrift correction mode totrue - Saw the
GitRepoand bundle deployment status error disappear, once the service had been recreated.
Automated Testing
- Integration tests cover propagation of the drift correction error to the bundle deployment status
- End-to-end tests verify that when a bundle is marked as modified, patching a
GitRepoto set itscorrectDrift.forceoption to true eventually updates the bundle status, in that the bundle will no longer appear as modified.
QA Testing Considerations
- Test how the
GitRepostatus is reflected in the Rancher UI
Regressions Considerations
N/A
Rechecked in v2.10-212d8b6e92992235d791d8f2aaea8436ab4f6b77-head with fleet:105.0.0+up0.11.0-rc.2 and problem persist.
Tried exact reproduction steps and error persisted:
Created a GitRepo with drift correction enabled (but not set to force mode) pointing to rancher/fleet-test-data's multiple-paths Edited the created service ports Checked status of the GitRepo and bundle deployment Updated the GitRepo drift correction mode to true
As relevant note, when updating the ports the following log appears:
"Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff.
Extra notes: UI issues updating ports via UI and editing yamls. Setting it back to backlog
The above warning has been fixed through #3045.
Tried to reproduce this issue against Rancher v2.10.0-rc.3 with Fleet v0.11.0, without success: when updating the GitRepo to set its correctDrift force mode to true, drift is corrected and the GitRepo is set back from Modified to Active after a few seconds.
The above warning has been fixed through #3045. Tried to reproduce this issue against Rancher v2.10.0-rc.3 with Fleet v0.11.0, without success: when updating the
GitRepoto set itscorrectDriftforce mode totrue, drift is corrected and theGitRepois set back fromModifiedtoActiveafter a few seconds.
I re-checked it in Rancher 2.10 with fleet fleet:105.0.1+up0.11.1 and still can reproduce it after pressing Force Update the Gitrepo once changed the port and still occurs:
Nevertheless, as discussed offline, agreed to close it and not to spend more time on this issue as it it as cornercase